[SYSTEMML-2090] Language documentation for paramserv builtin function

Closes #816.


Project: http://git-wip-us.apache.org/repos/asf/systemml/repo
Commit: http://git-wip-us.apache.org/repos/asf/systemml/commit/0284f593
Tree: http://git-wip-us.apache.org/repos/asf/systemml/tree/0284f593
Diff: http://git-wip-us.apache.org/repos/asf/systemml/diff/0284f593

Branch: refs/heads/gh-pages
Commit: 0284f593fa660ea81a98c8735cef346164391c98
Parents: 9db274f
Author: EdgarLGB <guobao...@atos.net>
Authored: Sun Aug 5 14:37:59 2018 -0700
Committer: Matthias Boehm <mboe...@gmail.com>
Committed: Sun Aug 5 14:53:50 2018 -0700

----------------------------------------------------------------------
 dml-language-reference.md | 77 ++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 77 insertions(+)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/systemml/blob/0284f593/dml-language-reference.md
----------------------------------------------------------------------
diff --git a/dml-language-reference.md b/dml-language-reference.md
index 5bf9099..8a89d99 100644
--- a/dml-language-reference.md
+++ b/dml-language-reference.md
@@ -53,6 +53,7 @@ limitations under the License.
     * [Read/Write Built-In 
Functions](dml-language-reference.html#readwrite-built-in-functions)
     * [Data Pre-Processing Built-In 
Functions](dml-language-reference.html#data-pre-processing-built-in-functions)
     * [Deep Learning Built-In 
Functions](dml-language-reference.html#deep-learning-built-in-functions)
+    * [Parameter Server Built-In 
Function](dml-language-reference.html#parameter-server-built-in-function)
     * [Other Built-In 
Functions](dml-language-reference.html#other-built-in-functions)
   * [Frames](dml-language-reference.html#frames)
     * [Creating Frames](dml-language-reference.html#creating-frames)
@@ -1536,6 +1537,82 @@ Examples:
 | bias_add             |                             | `ones = matrix(1, 
rows=1, cols=height*width); output = input + matrix(bias %*% ones, rows=1, 
cols=numChannels*height*width)`                                |
 | bias_multiply        |                             | `ones = matrix(1, 
rows=1, cols=height*width); output = input * matrix(bias %*% ones, rows=1, 
cols=numChannels*height*width)`                                |
 
+### Parameter Server Built-in Function
+Apart from data-parallel operations and task-parallel parfor loops, SystemML 
also supports a **data-parallel Parameter Server** via a built-in function 
**paramserv**. Currently both local multi-threaded and spark distributed 
backend are supported to execute the **paramserv** function. So far we only 
support a single parameter server with N workers as well as synchronous and 
asynchronous model updates per batch or epoch. For example, in order to train a 
model in local backend with update strategy BSP, 10 epochs, 64 batchsize, 10 
workers, **paramserv** function should look like this:
+
+
+    resultModel=paramserv(model=initModel, features=X, labels=Y, 
+                          upd="fun1", agg="fun2", epochs=10, k=10, 
hyperparams=hParams)
+
+
+**Table**: Inputs of paramserv function
+
+Parameters | Description | Type | Mandatory | Options
+-------- | ----------- | ---------- | ---------- | -------
+model | All the parameters (e.g., the weight and bias matrices) | list | yes | 
+features | Training features | matrix | yes 
+labels | Training labels | matrix | yes
+val_features | Validation features | matrix | no
+val_labels | Validation labels | matrix | no
+upd | Physical name of gradient calculation function. The format should be 
"related path:func name". For example, 
"./mnist_lenet_paramserv_sgd.dml::gradients". | string | yes
+agg | Physical name of gradient aggregation function. The format should be 
"related path:func name". For example, 
"./mnist_lenet_paramserv_sgd.dml::aggregation". | string | yes
+mode | Execution backend for data partitioning and worker execution | string | 
no | "LOCAL"(default), "REMOTE_SPARK"
+utype | Update strategy | string | no | "ASP"(default), "BSP"
+freq | Frequency of model updating | string | no | "EPOCH"(default), "BATCH"
+epochs | Number of epochs, where an epoch is a full scan over the data | 
integer | yes |
+batchsize | Size of a mini-batch (number of rows) | integer | no | 64(default)
+k | Number of workers | integer | no | Number of vcores(default)
+scheme | Scheme of data partition, i.e., how the data is distributed across 
workers | string | no | "DISJOINT_CONTIGUOUS"(default), "DISJOINT_ROUND_ROBIN", 
"DISJOINT_RANDOM", "OVERLAP_RESHUFFLE"
+hyperparams | Additional hyper parameters, e.g., learning rate, momentum | 
list | yes | 
+checkpointing | Checkpoint strategy, currently not supported | string | no | 
+
+**Table**: Output of paramserv function
+
+Output | Description | Type
+-------- | ----------- | ----------
+model | Trained model | list
+
+**Update function:**
+
+The update function calculates the gradients for a single mini-batch and the 
given model (e.g., via a forward and backward pass through a neural network). 
The implementation of this function should be based on a function signature 
like this: (i.e., **the input parameter including both type and name should be 
exactly the same as the below, except that the output name could be different**)
+
+```sh
+gradients = function(list[unknown] model, list[unknown] hyperparams,
+                     matrix[double] features, matrix[double] labels)
+          return (list[unknown] gradients)
+          # the output name can be something else than "gradients" but should 
always return a list
+```
+
+**Aggregate function:**
+
+The aggregate function then takes the computed or accrued gradients and 
updates the model via some optimizer such as Adagrad or Adam. The 
implementation of this function should be based on a function signature like 
this: (i.e., **the input parameter including both type and name should be 
exactly the same as the below, except that the output name could be different**)
+
+```sh
+aggregation = function(list[unknown] model, list[unknown] hyperparams,
+                       list[unknown] gradients)
+         return (list[unknown] modelResult)
+         # the output name can be something else than "modelResult" but should 
always return a list
+```
+
+**Update strategy:**
+
+Currently, two types of update strategy, **ASP** and **BSP**, are supported. 
**ASP**, a.k.a. _Asynchronous Parallel_, means that the model updates will be 
completed in an asynchronous manner. The parameter server updates the model and 
broadcasts the updated model immediately with the fresh gradients pushed by the 
worker and then the worker is able to pull the new updated model. This 
push-and-pull process is done asynchronously across workers. While **BSP**, 
a.k.a. _Bulk Synchronous Parallel_, the server will update the global model 
until having received all the gradients sent by workers in one iteration and 
then workers could move into the next iteration. Hence, the overall performance 
is affected by stragglers (i.e., the slowest worker).
+
+**Update frequency:**
+
+When pushing the gradients from workers to server for updating the model, we 
could determine how often this push-and-pull process will be done. Currently, 
two types of update frequency, **EPOCH** and **BATCH** are supported. When 
setting to **EPOCH**, the generated gradients of each mini-batch are 
accumulated locally in each worker. The accrued gradients are then pushed to 
the server whenever a worker finished an epoch. While setting to **BATCH**, the 
generated gradients of each mini-batch are pushed to server immediately to 
launch the push-and-pull process.
+
+**Data partition schemes:**
+
+Before launching the data-parallel parameter server, the original data will be 
partitioned across workers according to some schemes. Currently, four types of 
schemes are supported, Disjoint_Contigous, Disjoint_Round_Robin, 
Disjoint_Random, Overlap_Reshuffle.
+
+Scheme | Definition
+-------- | -----------
+Disjoint_Contiguous | For each worker, use a right indexing operation 
X[beg:end,] to obtain contiguous, non-overlapping partitions of rows
+Disjoint_Round_Robin | For each worker, use a permutation multiply or simpler 
a removeEmpty such as removeEmpty(target=X, margin=rows, 
select=(seq(1,nrow(X))%%k)==id)
+Disjoint_Random | For each worker, use a permutation multiply P[beg:end,] %*% 
X, where P is constructed for example with 
P=table(seq(1,nrow(X),sample(nrow(X), nrow(X)))), i.e., sampling without 
replacement to ensure disjointness
+Overlap_Reshuffle | Similar to the above, except to create a new permutation 
matrix for each worker and without the indexing on P
+
 ### Other Built-In Functions
 
 **Table 16**: Other Built-In Functions

Reply via email to