[
https://issues.apache.org/jira/browse/SYSTEMML-2299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16473311#comment-16473311
]
Matthias Boehm commented on SYSTEMML-2299:
------------------------------------------
[~Guobao] in general, this is a very good start. I would recommend to make the
description a little more explicit though:
* Hyper Parameters: Separate the model (weight and bias matrices) from the
hyper parameters. Some hyper parameters such as the batchsize and architecture
(in form of the given fun1) are already explicit inputs. Maybe we could pass
the other hyper parameters such as learning rate, momentum, regularization
(which mostly affect the optimizer and thus, fun2) via a separate named list?
* Formatting: Please use the code tag to highlight the function signature and
individual input types. You already give examples, but in order to make it
explicit, it would be good to define the types. For example, add the
alternatives for mode, freq, and checkpoint.
* Checkpoint: I don't understand what you mean by rollback recovery here. Maybe
we should start simple and types such as NONE, EPOCH, EPOCH10, to indicate at
which frequency we perform model checkpointing.
* Data Distribution: Another aspect that is currently unspecified is how the
data is distributed to the individual workers. How about adding an additional
parameter for that? Examples schemes are disjoint_contiguous (contiguous splits
of X and y), disjoint_round_robin (distributed X and y rowwise),
disjoint_random, overlap_reshuffle (every worker gets all data but reshuffled
in a different random order).
* Optional parameters: Finally, please specify which parameters are optional
and their defaults if not specified.
> API design of the paramserv function
> ------------------------------------
>
> Key: SYSTEMML-2299
> URL: https://issues.apache.org/jira/browse/SYSTEMML-2299
> Project: SystemML
> Issue Type: Sub-task
> Reporter: LI Guobao
> Assignee: LI Guobao
> Priority: Major
>
> The objective of “paramserv” built-in function is to update an initial or
> existing model with configuration. An initial function signature would be
> _model'=paramserv(model, X, y, X_val, y_val, upd=fun1, mode=SYNC, freq=EPOCH,
> agg=fun2, epochs=100, batchsize=64, k=7, checkpointing=rollback)_. We are
> interested in providing the model (which will be a struct-like data structure
> consisting of the weights, the biases and the hyperparameters), the training
> features and labels, the validation features and labels, the batch update
> function (i.e., gradient calculation func), the update strategy (e.g. sync,
> async, hogwild!, stale-synchronous), the update frequency (e.g. epoch or
> mini-batch), the gradient aggregation function, the number of epoch, the
> batch size, the degree of parallelism as well as the checkpointing strategy
> (e.g. rollback recovery). And the function will return a trained model in
> struct format.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)