Repository: madlib Updated Branches: refs/heads/master bc8aeeb11 -> 25d716328
Minibatch Preprocessor: Update online doc The online doc is outdated. This commit adds two new parameters that have been introduced since the last time the doc was edited. Closes #334 Project: http://git-wip-us.apache.org/repos/asf/madlib/repo Commit: http://git-wip-us.apache.org/repos/asf/madlib/commit/25d71632 Tree: http://git-wip-us.apache.org/repos/asf/madlib/tree/25d71632 Diff: http://git-wip-us.apache.org/repos/asf/madlib/diff/25d71632 Branch: refs/heads/master Commit: 25d71632816a8630aeeeff614747527346b891f3 Parents: bc8aeeb Author: Nandish Jayaram <[email protected]> Authored: Tue Oct 23 10:35:02 2018 -0700 Committer: Nandish Jayaram <[email protected]> Committed: Thu Nov 15 16:07:34 2018 -0800 ---------------------------------------------------------------------- .../utilities/minibatch_preprocessing.py_in | 24 +++++++++++++++----- .../utilities/minibatch_preprocessing.sql_in | 2 +- 2 files changed, 19 insertions(+), 7 deletions(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/madlib/blob/25d71632/src/ports/postgres/modules/utilities/minibatch_preprocessing.py_in ---------------------------------------------------------------------- diff --git a/src/ports/postgres/modules/utilities/minibatch_preprocessing.py_in b/src/ports/postgres/modules/utilities/minibatch_preprocessing.py_in index 0762a06..88433c9 100644 --- a/src/ports/postgres/modules/utilities/minibatch_preprocessing.py_in +++ b/src/ports/postgres/modules/utilities/minibatch_preprocessing.py_in @@ -487,10 +487,16 @@ class MiniBatchDocumentation: ---------------------------------------------------------------- SUMMARY ---------------------------------------------------------------- - MiniBatch Preprocessor is a utility function to pre process the input - data for use with models that support mini-batching as an optimization + The mini-batch preprocessor is a utility that prepares input data for + use by models that support mini-batch as an optimization option. (This + is currently only the case for Neural Networks.) It is effectively a + packing operation that builds arrays of dependent and independent + variables from the source data table. - #TODO add more here + The advantage of using mini-batching is that it can perform better than + stochastic gradient descent (default MADlib optimizer) because it uses + more than one training example at a time, typically resulting in faster + and smoother convergence. For more details on function usage: SELECT {schema_madlib}.{method}('usage') @@ -508,8 +514,13 @@ class MiniBatchDocumentation: dependent_varname, -- TEXT. Name of the dependent variable column independent_varname, -- TEXT. Name of the independent variable column - buffer_size -- INTEGER. Number of source input rows to - pack into batch + grouping_col -- TEXT. Default NULL. An expression list used + to group the input dataset into discrete groups + buffer_size -- INTEGER. Default computed automatically. + Number of source input rows to pack into a buffer + one_hot_encode_int_dep_var -- BOOLEAN. Default FALSE. Flag to one-hot + encode dependent variables that are + scalar integers ); @@ -519,10 +530,11 @@ class MiniBatchDocumentation: The output table produced by MiniBatch Preprocessor contains the following columns: - id -- INTEGER. Unique id for packed table. + __id__ -- INTEGER. Unique id for packed table. dependent_varname -- FLOAT8[]. Packed array of dependent variables. independent_varname -- FLOAT8[]. Packed array of independent variables. + grouping_cols -- TEXT. Name of grouping columns. --------------------------------------------------------------------------- The algorithm also creates a summary table named <output_table>_summary http://git-wip-us.apache.org/repos/asf/madlib/blob/25d71632/src/ports/postgres/modules/utilities/minibatch_preprocessing.sql_in ---------------------------------------------------------------------- diff --git a/src/ports/postgres/modules/utilities/minibatch_preprocessing.sql_in b/src/ports/postgres/modules/utilities/minibatch_preprocessing.sql_in index 1ac00fb..58668a1 100644 --- a/src/ports/postgres/modules/utilities/minibatch_preprocessing.sql_in +++ b/src/ports/postgres/modules/utilities/minibatch_preprocessing.sql_in @@ -46,7 +46,7 @@ arrays of dependent and independent variables from the source data table. The advantage of using mini-batching is that it can perform better than stochastic gradient descent (default MADlib optimizer) because it uses more than one training -example at a time, typically resulting faster and smoother convergence [1]. +example at a time, typically resulting in faster and smoother convergence [1]. @brief Utility that prepares input data for use by models that support
