Github user reductionista commented on a diff in the pull request: https://github.com/apache/madlib/pull/342#discussion_r243722566 --- Diff: src/ports/postgres/modules/utilities/minibatch_preprocessing.py_in --- @@ -580,3 +679,82 @@ class MiniBatchDocumentation: for help. """.format(**locals()) # --------------------------------------------------------------------- + @staticmethod + def minibatch_preprocessor_dl_help(schema_madlib, message): + method = "minibatch_preprocessor_dl" + summary = """ + ---------------------------------------------------------------- + SUMMARY + ---------------------------------------------------------------- + For Deep Learning based techniques such as Convolutional Neural Nets, + the input data is mostly images. These images can be represented as an + array of numbers where all elements are between 0 and 255 in value. + It is standard practice to divide each of these numbers by 255.0 to + normalize the image data. minibatch_preprocessor() is for general + use-cases, but for deep learning based use-cases we provide + minibatch_preprocessor_dl() that is light-weight and is + specific to image datasets. + + The normalizing constant is parameterized, and can be specified based + on the kind of image data used. + + For more details on function usage: + SELECT {schema_madlib}.{method}('usage') + """.format(**locals()) + + usage = """ + --------------------------------------------------------------------------- + USAGE + --------------------------------------------------------------------------- + SELECT {schema_madlib}.{method}( + source_table, -- TEXT. Name of the table containing input + data. Can also be a view + output_table, -- TEXT. Name of the output table for + mini-batching + dependent_varname, -- TEXT. Name of the dependent variable column + independent_varname, -- TEXT. Name of the independent variable + column + buffer_size -- INTEGER. Default computed automatically. + Number of source input rows to pack into a buffer + normalizing_const -- DOUBLE PRECISON. Default 255.0. The + normalizing constant to use for + standardizing arrays in independent_varname. + ); + + + --------------------------------------------------------------------------- + OUTPUT + --------------------------------------------------------------------------- + The output table produced by MiniBatch Preprocessor contains the + following columns: + + buffer_id -- INTEGER. Unique id for packed table. + dependent_varname -- FLOAT8[]. Packed array of dependent variables. + independent_varname -- FLOAT8[]. Packed array of independent + variables. + --- End diff -- Assuming my previous suggestion is taken, I would write {dependent_varname} and {independent_varname} here to distinguish from the columns in the summary table, which are literal strings rather than references to the parameters the user passes in.
---