[
https://issues.apache.org/jira/browse/SYSTEMML-1159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16347661#comment-16347661
]
Mike Dusenberry commented on SYSTEMML-1159:
-------------------------------------------
Thanks, [~return_01]. This should coincide nicely with some of the work that
[~mboehm7] has been doing / is planning on doing with {{parfor}} loops.
> Enable Remote Hyperparameter Tuning
> -----------------------------------
>
> Key: SYSTEMML-1159
> URL: https://issues.apache.org/jira/browse/SYSTEMML-1159
> Project: SystemML
> Issue Type: Improvement
> Affects Versions: SystemML 1.1
> Reporter: Mike Dusenberry
> Assignee: Janardhan
> Priority: Blocker
>
> Training a parameterized machine learning model (such as a large neural net
> in deep learning) requires learning a set of ideal model parameters from the
> data, as well as determining appropriate hyperparameters (or "settings") for
> the training process itself. In the latter case, the hyperparameters (i.e.
> learning rate, regularization strength, dropout percentage, model
> architecture, etc.) can not be learned from the data, and instead are
> determined via a search across a space for each hyperparameter. For large
> numbers of hyperparameters (such as in deep learning models), the current
> literature points to performing staged, randomized grid searches over the
> space to produce distributions of performance, narrowing the space after each
> search \[1]. Thus, for efficient hyperparameter optimization, it is
> desirable to train several models in parallel, with each model trained over
> the full dataset. For deep learning models, a mini-batch training approach
> is currently state-of-the-art, and thus separate models with different
> hyperparameters could, conceivably, be easily trained on each of the nodes in
> a cluster.
> In order to allow for the training of deep learning models, SystemML needs to
> determine a solution to enable this scenario with the Spark backend.
> Specifically, if the user has a {{train}} function that takes a set of
> hyperparameters and trains a model with a mini-batch approach (and thus is
> only making use of single-node instructions within the function), the user
> should be able to wrap this function with, for example, a remote {{parfor}}
> construct that samples hyperparameters and calls the {{train}} function on
> each machine in parallel.
> To be clear, each model would need access to the entire dataset, and each
> model would be trained independently.
> \[1]: http://www.jmlr.org/papers/volume13/bergstra12a/bergstra12a.pdf
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)