[ 
https://issues.apache.org/jira/browse/SYSTEMML-2299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16474602#comment-16474602
 ] 

Matthias Boehm commented on SYSTEMML-2299:
------------------------------------------

[~Guobao] initially I would leave it up to the user by allowing the 
specification via a parameter such as {{mode}} (maybe rename the existing mode 
to utype?). This is similar to parfor were a user can specify the execution 
mode as {{LOCAL}}, {{REMOTE_MR}} or {{REMOTE_SPARK}}. While building the 
runtime, let's make this parameter mandatory. Later we can generalize that and 
automatically decide the execution mode if not provided by the user: for 
example, we could compute a cost estimate based on the number of floating point 
operations per batch, scaled by the number of epochs and datasize. If a certain 
minimum cost threshold is exceeded and all memory constraints are met, we could 
automatically route it to distributed operations. 

> API design of the paramserv function
> ------------------------------------
>
>                 Key: SYSTEMML-2299
>                 URL: https://issues.apache.org/jira/browse/SYSTEMML-2299
>             Project: SystemML
>          Issue Type: Sub-task
>            Reporter: LI Guobao
>            Assignee: LI Guobao
>            Priority: Major
>
> The objective of “paramserv” built-in function is to update an initial or 
> existing model with configuration. An initial function signature would be: 
> {code:java}
> model'=paramserv(model, features=X, labels=Y, val_features=X_val, 
> val_labels=Y_val, upd="fun1", agg="fun2", mode="BSP", freq="BATCH", 
> epochs=100, batchsize=64, k=7, scheme="disjoint_contiguous", 
> hyperparams=params, checkpointing="NONE"){code}
> We are interested in providing the model (which will be a struct-like data 
> structure consisting of the weights, the biases and the hyperparameters), the 
> training features and labels, the validation features and labels, the batch 
> update function (i.e., gradient calculation func), the update strategy (e.g. 
> sync, async, hogwild!, stale-synchronous), the update frequency (e.g. epoch 
> or mini-batch), the gradient aggregation function, the number of epoch, the 
> batch size, the degree of parallelism, the data partition scheme, a list of 
> additional hyper parameters, as well as the checkpointing strategy. And the 
> function will return a trained model in struct format.
> *Inputs*:
>  * model <list>: a list consisting of the weight and bias matrices
>  * features <matrix>: training features matrix
>  * labels <matrix>: training label matrix
>  * val_features <matrix>: validation features matrix
>  * val_labels <matrix>: validation label matrix
>  * upd <string>: the name of gradient calculation function
>  * agg <string>: the name of gradient aggregation function
>  * mode <string> (options: BSP, ASP, SSP): the updating mode
>  * freq <string> (options: EPOCH, BATCH): the frequence of updates
>  * epochs <integer>: the number of epoch
>  * batchsize <integer> [optional]: the size of batch, if the update frequence 
> is "EPOCH", this argument will be ignored
>  * k <integer>: the degree of parallelism
>  * scheme <string> (options: disjoint_contiguous, disjoint_round_robin, 
> disjoint_random, overlap_reshuffle): the scheme of data partition, i.e., how 
> the data is distributed across workers
>  * hyperparams <list> [optional]: a list consisting of the additional hyper 
> parameters, e.g., learning rate, momentum
>  * checkpointing <string> (options: NONE(default), EPOCH, EPOCH10) 
> [optional]: the checkpoint strategy, we could set a checkpoint for each epoch 
> or each 10 epochs 
> *Output*:
>  * model' <list>: a list consisting of the updated weight and bias matrices



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to