[
https://issues.apache.org/jira/browse/SYSTEMML-1159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16090453#comment-16090453
]
Mike Dusenberry edited comment on SYSTEMML-1159 at 7/17/17 8:02 PM:
--------------------------------------------------------------------
[~return_01] Thanks–adding HogWild asynchronous SGD would be quite
interesting. However, this particular JIRA issue is referring to
*hyperparameters* rather than the model parameters, the latter of which HogWild
is applicable. If you are interested in pursuing the addition of support for
HogWild, could you please create a new JIRA issue for it, and link it to
SYSTEMML-540? SYSTEMML-1563 may also be of interest -- I added a distributed
synchronous SGD algorithm a while back, implemented currently in the
[distributed MNIST LeNet |
https://github.com/apache/systemml/blob/master/scripts/nn/examples/mnist_lenet_distrib_sgd.dml]
algorithm. We are currently working to improve the engine performance of it
in SYSTEMML-1760.
was (Author: [email protected]):
[~return_01] Thanks–adding HogWild asynchronous SGD would be quite
interesting. However, this particular JIRA issue is referring to
*hyperparameters* rather than the model parameters, the latter of which HogWild
is applicable. If you are interested in pursuing the addition of support for
HogWild, could you please create a new JIRA issue for it, and link it to
SYSTEMML-540? SYSTEMML-1563 may also be of interest -- I added a distributed
synchronous SGD algorithm, implemented currently in [distributed MNIST LeNet |
https://github.com/apache/systemml/blob/master/scripts/nn/examples/mnist_lenet_distrib_sgd.dml]
algorithm.
> Enable Remote Hyperparameter Tuning
> -----------------------------------
>
> Key: SYSTEMML-1159
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1159
> Project: SystemML
> Issue Type: Improvement
> Affects Versions: SystemML 1.0
> Reporter: Mike Dusenberry
> Priority: Blocker
>
> Training a parameterized machine learning model (such as a large neural net
> in deep learning) requires learning a set of ideal model parameters from the
> data, as well as determining appropriate hyperparameters (or "settings") for
> the training process itself. In the latter case, the hyperparameters (i.e.
> learning rate, regularization strength, dropout percentage, model
> architecture, etc.) can not be learned from the data, and instead are
> determined via a search across a space for each hyperparameter. For large
> numbers of hyperparameters (such as in deep learning models), the current
> literature points to performing staged, randomized grid searches over the
> space to produce distributions of performance, narrowing the space after each
> search \[1]. Thus, for efficient hyperparameter optimization, it is
> desirable to train several models in parallel, with each model trained over
> the full dataset. For deep learning models, a mini-batch training approach
> is currently state-of-the-art, and thus separate models with different
> hyperparameters could, conceivably, be easily trained on each of the nodes in
> a cluster.
> In order to allow for the training of deep learning models, SystemML needs to
> determine a solution to enable this scenario with the Spark backend.
> Specifically, if the user has a {{train}} function that takes a set of
> hyperparameters and trains a model with a mini-batch approach (and thus is
> only making use of single-node instructions within the function), the user
> should be able to wrap this function with, for example, a remote {{parfor}}
> construct that samples hyperparameters and calls the {{train}} function on
> each machine in parallel.
> To be clear, each model would need access to the entire dataset, and each
> model would be trained independently.
> \[1]: http://www.jmlr.org/papers/volume13/bergstra12a/bergstra12a.pdf
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)