Github user reductionista commented on a diff in the pull request:
https://github.com/apache/madlib/pull/342#discussion_r243722566
--- Diff:
src/ports/postgres/modules/utilities/minibatch_preprocessing.py_in ---
@@ -580,3 +679,82 @@ class MiniBatchDocumentation:
for help.
""".format(**locals())
# ---------------------------------------------------------------------
+ @staticmethod
+ def minibatch_preprocessor_dl_help(schema_madlib, message):
+ method = "minibatch_preprocessor_dl"
+ summary = """
+ ----------------------------------------------------------------
+ SUMMARY
+ ----------------------------------------------------------------
+ For Deep Learning based techniques such as Convolutional Neural
Nets,
+ the input data is mostly images. These images can be represented
as an
+ array of numbers where all elements are between 0 and 255 in value.
+ It is standard practice to divide each of these numbers by 255.0 to
+ normalize the image data. minibatch_preprocessor() is for general
+ use-cases, but for deep learning based use-cases we provide
+ minibatch_preprocessor_dl() that is light-weight and is
+ specific to image datasets.
+
+ The normalizing constant is parameterized, and can be specified
based
+ on the kind of image data used.
+
+ For more details on function usage:
+ SELECT {schema_madlib}.{method}('usage')
+ """.format(**locals())
+
+ usage = """
+
---------------------------------------------------------------------------
+ USAGE
+
---------------------------------------------------------------------------
+ SELECT {schema_madlib}.{method}(
+ source_table, -- TEXT. Name of the table containing
input
+ data. Can also be a view
+ output_table, -- TEXT. Name of the output table for
+ mini-batching
+ dependent_varname, -- TEXT. Name of the dependent variable
column
+ independent_varname, -- TEXT. Name of the independent
variable
+ column
+ buffer_size -- INTEGER. Default computed
automatically.
+ Number of source input rows to pack
into a buffer
+ normalizing_const -- DOUBLE PRECISON. Default 255.0. The
+ normalizing constant to use for
+ standardizing arrays in
independent_varname.
+ );
+
+
+
---------------------------------------------------------------------------
+ OUTPUT
+
---------------------------------------------------------------------------
+ The output table produced by MiniBatch Preprocessor contains the
+ following columns:
+
+ buffer_id -- INTEGER. Unique id for packed table.
+ dependent_varname -- FLOAT8[]. Packed array of dependent
variables.
+ independent_varname -- FLOAT8[]. Packed array of independent
+ variables.
+
--- End diff --
Assuming my previous suggestion is taken, I would write {dependent_varname}
and {independent_varname} here to distinguish from the columns in the summary
table, which are literal strings rather than references to the parameters the
user passes in.
---