[ 
https://issues.apache.org/jira/browse/SYSTEMML-2299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16473311#comment-16473311
 ] 

Matthias Boehm commented on SYSTEMML-2299:
------------------------------------------

[~Guobao] in general, this is a very good start. I would recommend to make the 
description a little more explicit though:
* Hyper Parameters: Separate the model (weight and bias matrices) from the 
hyper parameters. Some hyper parameters such as the batchsize and architecture 
(in form of the given fun1) are already explicit inputs. Maybe we could pass 
the other hyper parameters such as learning rate, momentum, regularization 
(which mostly affect the optimizer and thus, fun2) via a separate named list?
* Formatting: Please use the code tag to highlight the function signature and 
individual input types. You already give examples, but in order to make it 
explicit, it would be good to define the types. For example, add the 
alternatives for mode, freq, and checkpoint.
* Checkpoint: I don't understand what you mean by rollback recovery here. Maybe 
we should start simple and types such as NONE, EPOCH, EPOCH10, to indicate at 
which frequency we perform model checkpointing.
* Data Distribution: Another aspect that is currently unspecified is how the 
data is distributed to the individual workers. How about adding an additional 
parameter for that? Examples schemes are disjoint_contiguous (contiguous splits 
of X and y), disjoint_round_robin (distributed X and y rowwise), 
disjoint_random, overlap_reshuffle (every worker gets all data but reshuffled 
in a different random order).
* Optional parameters: Finally, please specify which parameters are optional 
and their defaults if not specified.     

> API design of the paramserv function
> ------------------------------------
>
>                 Key: SYSTEMML-2299
>                 URL: https://issues.apache.org/jira/browse/SYSTEMML-2299
>             Project: SystemML
>          Issue Type: Sub-task
>            Reporter: LI Guobao
>            Assignee: LI Guobao
>            Priority: Major
>
> The objective of “paramserv” built-in function is to update an initial or 
> existing model with configuration. An initial function signature would be 
> _model'=paramserv(model, X, y, X_val, y_val, upd=fun1, mode=SYNC, freq=EPOCH, 
> agg=fun2, epochs=100, batchsize=64, k=7, checkpointing=rollback)_. We are 
> interested in providing the model (which will be a struct-like data structure 
> consisting of the weights, the biases and the hyperparameters), the training 
> features and labels, the validation features and labels, the batch update 
> function (i.e., gradient calculation func), the update strategy (e.g. sync, 
> async, hogwild!, stale-synchronous), the update frequency (e.g. epoch or 
> mini-batch), the gradient aggregation function, the number of epoch, the 
> batch size, the degree of parallelism as well as the checkpointing strategy 
> (e.g. rollback recovery). And the function will return a trained model in 
> struct format.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to