GitHub user njayaram2 opened a pull request:
https://github.com/apache/madlib/pull/243
MLP: Add minibatch gradient descent solver
JIRA: MADLIB-1206
This commit adds support for mini-batch based gradient descent for MLP.
If the input table contains a 2D matrix for independent variable,
minibatch is automatically used as the solver. Two minibatch specific
optimizers are also introduced: batch_size and n_epochs.
- batch_size is defaulted to min(200, buffer_size), where buffer_size is
equal to the number of original input rows packed into a single row in
the matrix.
- n_epochs is the number of times all the batches in a buffer are
iterated over (default 1).
Other changes include:
- dependent variable in the minibatch solver is also a matrix now. It
was initially a vector.
- Randomize the order of processing a batch within an epoch.
- MLP minibatch currently doesn't support weights param, an error is
thrown now.
- Delete an unused type named mlp_step_result.
- Add unit tests for newly added functions in python file.
Co-authored-by: Rahul Iyer <[email protected]>
Co-authored-by: Nikhil Kak <[email protected]>
Closes #243
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/madlib/madlib
mlp-minibatch-with-preprocessed-data-rebased
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/madlib/pull/243.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #243
----
commit d9306f7c6a44f64c53df13c34759da55468c4d26
Author: Nandish Jayaram <njayaram@...>
Date: 2018-02-28T00:51:42Z
MLP: Add minibatch gradient descent solver
JIRA: MADLIB-1206
This commit adds support for mini-batch based gradient descent for MLP.
If the input table contains a 2D matrix for independent variable,
minibatch is automatically used as the solver. Two minibatch specific
optimizers are also introduced: batch_size and n_epochs.
- batch_size is defaulted to min(200, buffer_size), where buffer_size is
equal to the number of original input rows packed into a single row in
the matrix.
- n_epochs is the number of times all the batches in a buffer are
iterated over (default 1).
Other changes include:
- dependent variable in the minibatch solver is also a matrix now. It
was initially a vector.
- Randomize the order of processing a batch within an epoch.
- MLP minibatch currently doesn't support weights param, an error is
thrown now.
- Delete an unused type named mlp_step_result.
- Add unit tests for newly added functions in python file.
Co-authored-by: Rahul Iyer <[email protected]>
Co-authored-by: Nikhil Kak <[email protected]>
Closes #243
----
---