Added: incubator/singa/site/trunk/content/markdown/v0.3.0/mlp.md
URL: 
http://svn.apache.org/viewvc/incubator/singa/site/trunk/content/markdown/v0.3.0/mlp.md?rev=1740048&view=auto
==============================================================================
--- incubator/singa/site/trunk/content/markdown/v0.3.0/mlp.md (added)
+++ incubator/singa/site/trunk/content/markdown/v0.3.0/mlp.md Wed Apr 20 
05:09:06 2016
@@ -0,0 +1,215 @@
+# MLP Example
+
+---
+
+Multilayer perceptron (MLP) is a subclass of feed-forward neural networks.
+A MLP typically consists of multiple directly connected layers, with each 
layer fully
+connected to the next one. In this example, we will use SINGA to train a
+[simple MLP model proposed by Ciresan](http://arxiv.org/abs/1003.0358)
+for classifying handwritten digits from the [MNIST 
dataset](http://yann.lecun.com/exdb/mnist/).
+
+## Running instructions
+
+Please refer to the [installation](installation.html) page for
+instructions on building SINGA, and the [quick start](quick-start.html)
+for instructions on starting zookeeper.
+
+We have provided scripts for preparing the training and test dataset in 
*examples/cifar10/*.
+
+    # in examples/mnist
+    $ cp Makefile.example Makefile
+    $ make download
+    $ make create
+
+### Training on CPU
+
+After the datasets are prepared, we start the training by
+
+    ./bin/singa-run.sh -conf examples/mnist/job.conf
+
+After it is started, you should see output like
+
+    Record job information to /tmp/singa-log/job-info/job-1-20150817-055231
+    Executing : ./singa -conf /xxx/incubator-singa/examples/mnist/job.conf 
-singa_conf /xxx/incubator-singa/conf/singa.conf -singa_job 1
+    E0817 07:15:09.211885 34073 cluster.cc:51] proc #0 -> 192.168.5.128:49152 
(pid = 34073)
+    E0817 07:15:14.972231 34114 server.cc:36] Server (group = 0, id = 0) start
+    E0817 07:15:14.972520 34115 worker.cc:134] Worker (group = 0, id = 0) start
+    E0817 07:15:24.462602 34073 trainer.cc:373] Test step-0, loss : 2.341021, 
accuracy : 0.109100
+    E0817 07:15:47.341076 34073 trainer.cc:373] Train step-0, loss : 2.357269, 
accuracy : 0.099000
+    E0817 07:16:07.173364 34073 trainer.cc:373] Train step-10, loss : 
2.222740, accuracy : 0.201800
+    E0817 07:16:26.714855 34073 trainer.cc:373] Train step-20, loss : 
2.091030, accuracy : 0.327200
+    E0817 07:16:46.590946 34073 trainer.cc:373] Train step-30, loss : 
1.969412, accuracy : 0.442100
+    E0817 07:17:06.207080 34073 trainer.cc:373] Train step-40, loss : 
1.865466, accuracy : 0.514800
+    E0817 07:17:25.890033 34073 trainer.cc:373] Train step-50, loss : 
1.773849, accuracy : 0.569100
+    E0817 07:17:51.208935 34073 trainer.cc:373] Test step-60, loss : 1.613709, 
accuracy : 0.662100
+    E0817 07:17:53.176766 34073 trainer.cc:373] Train step-60, loss : 
1.659150, accuracy : 0.652600
+    E0817 07:18:12.783370 34073 trainer.cc:373] Train step-70, loss : 
1.574024, accuracy : 0.666000
+    E0817 07:18:32.904942 34073 trainer.cc:373] Train step-80, loss : 
1.529380, accuracy : 0.670500
+    E0817 07:18:52.608111 34073 trainer.cc:373] Train step-90, loss : 
1.443911, accuracy : 0.703500
+    E0817 07:19:12.168465 34073 trainer.cc:373] Train step-100, loss : 
1.387759, accuracy : 0.721000
+    E0817 07:19:31.855865 34073 trainer.cc:373] Train step-110, loss : 
1.335246, accuracy : 0.736500
+    E0817 07:19:57.327133 34073 trainer.cc:373] Test step-120, loss : 
1.216652, accuracy : 0.769900
+
+After the training of some steps (depends on the setting) or the job is
+finished, SINGA will [checkpoint](checkpoint.html) the model parameters.
+
+### Training on GPU
+
+To train this example model on GPU, just add a field in the configuration file 
for
+the GPU device,
+
+    # job.conf
+    gpu: 0
+
+### Training using Python script
+
+The python helpers come with SINGA 0.2 make it easy to configure the job. For 
example
+the job.conf is replaced with a simple python script mnist_mlp.py
+which has about 30 lines of code following the [Keras API](http://keras.io/).
+
+    ./bin/singa-run.sh -exec tool/python/examples/mnist_mlp.py
+
+
+
+## Details
+
+To train a model in SINGA, you need to prepare the datasets,
+and a job configuration which specifies the neural net structure, training
+algorithm (BP or CD), SGD update algorithm (e.g. Adagrad),
+number of training/test steps, etc.
+
+### Data preparation
+
+Before using SINGA, you need to write a program to pre-process the dataset you
+use to a format that SINGA can read. Please refer to the
+[Data Preparation](data.html) to get details about preparing
+this MNIST dataset.
+
+
+### Neural net
+
+<div style = "text-align: center">
+<img src = "../images/example-mlp.png" style = "width: 230px">
+<br/><strong>Figure 1 - Net structure of the MLP example. </strong></img>
+</div>
+
+
+Figure 1 shows the structure of the simple MLP model, which is constructed 
following
+[Ciresan's paper](http://arxiv.org/abs/1003.0358). The dashed circle contains
+two layers which represent one feature transformation stage. There are 6 such
+stages in total. They sizes of the 
[InnerProductLayer](layer.html#innerproductlayer)s in these circles decrease 
from
+2500->2000->1500->1000->500->10.
+
+Next we follow the guide in [neural net page](neural-net.html)
+and [layer page](layer.html) to write the neural net configuration.
+
+* We configure an input layer to read the training/testing records from a disk 
file.
+
+        layer {
+            name: "data"
+            type: kRecordInput
+            store_conf {
+              backend: "kvfile"
+              path: "examples/mnist/train_data.bin"
+              random_skip: 5000
+              batchsize: 64
+              shape: 784
+              std_value: 127.5
+              mean_value: 127.5
+             }
+             exclude: kTest
+          }
+
+        layer {
+            name: "data"
+            type: kRecordInput
+            store_conf {
+              backend: "kvfile"
+              path: "examples/mnist/test_data.bin"
+              batchsize: 100
+              shape: 784
+              std_value: 127.5
+              mean_value: 127.5
+             }
+             exclude: kTrain
+          }
+
+
+* All [InnerProductLayer](layer.html#innerproductlayer)s are configured 
similarly as,
+
+        layer{
+          name: "fc1"
+          type: kInnerProduct
+          srclayers:"data"
+          innerproduct_conf{
+            num_output: 2500
+          }
+          param{
+            name: "w1"
+            ...
+          }
+          param{
+            name: "b1"
+            ..
+          }
+        }
+
+    with the `num_output` decreasing from 2500 to 10.
+
+* A [STanhLayer](layer.html#stanhlayer) is connected to every InnerProductLayer
+except the last one. It transforms the feature via scaled tanh function.
+
+        layer{
+          name: "tanh1"
+          type: kSTanh
+          srclayers:"fc1"
+        }
+
+* The final [Softmax loss layer](layer.html#softmaxloss) connects
+to LabelLayer and the last STanhLayer.
+
+        layer{
+          name: "loss"
+          type:kSoftmaxLoss
+          softmaxloss_conf{ topk:1 }
+          srclayers:"fc6"
+          srclayers:"data"
+        }
+
+### Updater
+
+The [normal SGD updater](updater.html#updater) is selected.
+The learning rate shrinks by 0.997 every 60 steps (i.e., one epoch).
+
+    updater{
+      type: kSGD
+      learning_rate{
+        base_lr: 0.001
+        type : kStep
+        step_conf{
+          change_freq: 60
+          gamma: 0.997
+        }
+      }
+    }
+
+### TrainOneBatch algorithm
+
+The MLP model is a feed-forward model, hence
+[Back-propagation algorithm](train-one-batch#back-propagation)
+is selected.
+
+    train_one_batch {
+      alg: kBP
+    }
+
+### Cluster setting
+
+The following configuration set a single worker and server for training.
+[Training frameworks](frameworks.html) page introduces configurations of a 
couple of distributed
+training frameworks.
+
+    cluster {
+      nworker_groups: 1
+      nserver_groups: 1
+    }

Added: incubator/singa/site/trunk/content/markdown/v0.3.0/model-config.md
URL: 
http://svn.apache.org/viewvc/incubator/singa/site/trunk/content/markdown/v0.3.0/model-config.md?rev=1740048&view=auto
==============================================================================
--- incubator/singa/site/trunk/content/markdown/v0.3.0/model-config.md (added)
+++ incubator/singa/site/trunk/content/markdown/v0.3.0/model-config.md Wed Apr 
20 05:09:06 2016
@@ -0,0 +1,294 @@
+# Model Configuration
+
+---
+
+SINGA uses the stochastic gradient descent (SGD) algorithm to train parameters
+of deep learning models.  For each SGD iteration, there is a
+[Worker](architecture.html) computing
+gradients of parameters from the NeuralNet and a [Updater]() updating parameter
+values based on gradients. Hence the model configuration mainly consists these
+three parts. We will introduce the NeuralNet, Worker and Updater in the
+following paragraphs and describe the configurations for them. All model
+configuration is specified in the model.conf file in the user provided
+workspace folder. E.g., the [cifar10 example 
folder](https://github.com/apache/incubator-singa/tree/master/examples/cifar10)
+has a model.conf file.
+
+
+## NeuralNet
+
+### Uniform model (neuralnet) representation
+
+<img src = "../images/model-categorization.png" style = "width: 400px"> Fig. 1:
+Deep learning model categorization</img>
+
+Many deep learning models have being proposed. Fig. 1 is a categorization of
+popular deep learning models based on the layer connections. The
+[NeuralNet](https://github.com/apache/incubator-singa/blob/master/include/neuralnet/neuralnet.h)
+abstraction of SINGA consists of multiple directly connected layers. This
+abstraction is able to represent models from all the three categorizations.
+
+  * For the feed-forward models, their connections are already directed.
+
+  * For the RNN models, we unroll them into directed connections, as shown in
+  Fig. 2.
+
+  * For the undirected connections in RBM, DBM, etc., we replace each 
undirected
+  connection with two directed connection, as shown in Fig. 3.
+
+<div style = "height: 200px">
+<div style = "float:left; text-align: center">
+<img src = "../images/unroll-rbm.png" style = "width: 280px"> <br/>Fig. 2: 
Unroll RBM </img>
+</div>
+<div style = "float:left; text-align: center; margin-left: 40px">
+<img src = "../images/unroll-rnn.png" style = "width: 550px"> <br/>Fig. 3: 
Unroll RNN </img>
+</div>
+</div>
+
+In specific, the NeuralNet class is defined in
+[neuralnet.h](https://github.com/apache/incubator-singa/blob/master/include/neuralnet/neuralnet.h)
 :
+
+    ...
+    vector<Layer*> layers_;
+    ...
+
+The Layer class is defined in
+[base_layer.h](https://github.com/apache/incubator-singa/blob/master/include/neuralnet/base_layer.h):
+
+    vector<Layer*> srclayers_, dstlayers_;
+    LayerProto layer_proto_;  // layer configuration, including meta info, 
e.g., name
+    ...
+
+
+The connection with other layers are kept in the `srclayers_` and `dstlayers_`.
+Since there are many different feature transformations, there are many
+different Layer implementations correspondingly. For layers that have
+parameters in their feature transformation functions, they would have Param
+instances in the layer class, e.g.,
+
+    Param weight;
+
+
+### Configure the structure of a NeuralNet instance
+
+To train a deep learning model, the first step is to write the configurations
+for the model structure, i.e., the layers and connections for the NeuralNet.
+Like [Caffe](http://caffe.berkeleyvision.org/), we use the [Google Protocol
+Buffer](https://developers.google.com/protocol-buffers/) to define the
+configuration protocol. The
+[NetProto](https://github.com/apache/incubator-singa/blob/master/src/proto/model.proto)
+specifies the configuration fields for a NeuralNet instance,
+
+message NetProto {
+  repeated LayerProto layer = 1;
+  ...
+}
+
+The configuration is then
+
+    layer {
+      // layer configuration
+    }
+    layer {
+      // layer configuration
+    }
+    ...
+
+To configure the model structure, we just configure each layer involved in the 
model.
+
+    message LayerProto {
+      // the layer name used for identification
+      required string name = 1;
+      // source layer names
+      repeated string srclayers = 3;
+      // parameters, e.g., weight matrix or bias vector
+      repeated ParamProto param = 12;
+      // the layer type from the enum above
+      required LayerType type = 20;
+      // configuration for convolution layer
+      optional ConvolutionProto convolution_conf = 30;
+      // configuration for concatenation layer
+      optional ConcateProto concate_conf = 31;
+      // configuration for dropout layer
+      optional DropoutProto dropout_conf = 33;
+      ...
+    }
+
+A sample configuration for a feed-forward model is like
+
+    layer {
+      name : "input"
+      type : kRecordInput
+    }
+    layer {
+      name : "conv"
+      type : kInnerProduct
+      srclayers : "input"
+      param {
+        // configuration for parameter
+      }
+      innerproduct_conf {
+        // configuration for this specific layer
+      }
+      ...
+    }
+
+The layer type list is defined in
+[LayerType](https://github.com/apache/incubator-singa/blob/master/src/proto/model.proto).
+One type (kFoo) corresponds to one child class of Layer (FooLayer) and one
+configuration field (foo_conf). All built-in layers are introduced in the 
[layer page](layer.html).
+
+## Worker
+
+At the beginning, the Work will initialize the values of Param instances of
+each layer either randomly (according to user configured distribution) or
+loading from a [checkpoint file]().  For each training iteration, the worker
+visits layers of the neural network to compute gradients of Param instances of
+each layer. Corresponding to the three categories of models, there are three
+different algorithm to compute the gradients of a neural network.
+
+  1. Back-propagation (BP) for feed-forward models
+  2. Back-propagation through time (BPTT) for recurrent neural networks
+  3. Contrastive divergence (CD) for RBM, DBM, etc models.
+
+SINGA has provided these three algorithms as three Worker implementations.
+Users only need to configure in the model.conf file to specify which algorithm
+should be used. The configuration protocol is
+
+    message ModelProto {
+      ...
+      enum GradCalcAlg {
+      // BP algorithm for feed-forward models, e.g., CNN, MLP, RNN
+      kBP = 1;
+      // BPTT for recurrent neural networks
+      kBPTT = 2;
+      // CD algorithm for RBM, DBM etc., models
+      kCd = 3;
+      }
+      // gradient calculation algorithm
+      required GradCalcAlg alg = 8 [default = kBackPropagation];
+      ...
+    }
+
+These algorithms override the TrainOneBatch function of the Worker. E.g., the
+BPWorker implements it as
+
+    void BPWorker::TrainOneBatch(int step, Metric* perf) {
+      Forward(step, kTrain, train_net_, perf);
+      Backward(step, train_net_);
+    }
+
+The Forward function passes the raw input features of one mini-batch through
+all layers, and the Backward function visits the layers in reverse order to
+compute the gradients of the loss w.r.t each layer's feature and each layer's
+Param objects. Different algorithms would visit the layers in different orders.
+Some may traverses the neural network multiple times, e.g., the CDWorker's
+TrainOneBatch function is:
+
+    void CDWorker::TrainOneBatch(int step, Metric* perf) {
+      PostivePhase(step, kTrain, train_net_, perf);
+      NegativePhase(step, kTran, train_net_, perf);
+      GradientPhase(step, train_net_);
+    }
+
+Each `*Phase` function would visit all layers one or multiple times.
+All algorithms will finally call two functions of the Layer class:
+
+     /**
+      * Transform features from connected layers into features of this layer.
+      *
+      * @param phase kTrain, kTest, kPositive, etc.
+      */
+     virtual void ComputeFeature(Phase phase, Metric* perf) = 0;
+     /**
+      * Compute gradients for parameters (and connected layers).
+      *
+      * @param phase kTrain, kTest, kPositive, etc.
+      */
+     virtual void ComputeGradient(Phase phase) = 0;
+
+All [Layer implementations]() must implement the above two functions.
+
+
+## Updater
+
+Once the gradients of parameters are computed, the Updater will update
+parameter values.  There are many SGD variants for updating parameters, like
+[AdaDelta](http://arxiv.org/pdf/1212.5701v1.pdf),
+[AdaGrad](http://www.magicbroom.info/Papers/DuchiHaSi10.pdf),
+[RMSProp](http://www.cs.toronto.edu/~tijmen/csc321/slides/lecture_slides_lec6.pdf),
+[Nesterov](http://scholar.google.com/citations?view_op=view_citation&amp;hl=en&amp;user=DJ8Ep8YAAAAJ&amp;citation_for_view=DJ8Ep8YAAAAJ:hkOj_22Ku90C)
+and SGD with momentum. The core functions of the Updater is
+
+    /**
+     * Update parameter values based on gradients
+     * @param step training step
+     * @param param pointer to the Param object
+     * @param grad_scale scaling factor for the gradients
+     */
+    void Update(int step, Param* param, float grad_scale=1.0f);
+    /**
+     * @param step training step
+     * @return the learning rate for this step
+     */
+    float GetLearningRate(int step);
+
+SINGA provides several built-in updaters and learning rate change methods.
+Users can configure them according to the UpdaterProto
+
+    message UpdaterProto {
+      enum UpdaterType{
+        // noraml SGD with momentum and weight decay
+        kSGD = 1;
+        // adaptive subgradient, 
http://www.magicbroom.info/Papers/DuchiHaSi10.pdf
+        kAdaGrad = 2;
+        // 
http://www.cs.toronto.edu/~tijmen/csc321/slides/lecture_slides_lec6.pdf
+        kRMSProp = 3;
+        // Nesterov first optimal gradient method
+        kNesterov = 4;
+      }
+      // updater type
+      required UpdaterType type = 1 [default=kSGD];
+      // configuration for RMSProp algorithm
+      optional RMSPropProto rmsprop_conf = 50;
+
+      enum ChangeMethod {
+        kFixed = 0;
+        kInverseT = 1;
+        kInverse = 2;
+        kExponential = 3;
+        kLinear = 4;
+        kStep = 5;
+        kFixedStep = 6;
+      }
+      // change method for learning rate
+      required ChangeMethod lr_change= 2 [default = kFixed];
+
+      optional FixedStepProto fixedstep_conf=40;
+      ...
+      optional float momentum = 31 [default = 0];
+      optional float weight_decay = 32 [default = 0];
+      // base learning rate
+      optional float base_lr = 34 [default = 0];
+    }
+
+
+## Other model configuration fields
+
+Some other important configuration fields for training a deep learning model is
+listed:
+
+    // model name, e.g., "cifar10-dcnn", "mnist-mlp"
+    string name;
+    // displaying training info for every this number of iterations, default 
is 0
+    int32 display_freq;
+    // total num of steps/iterations for training
+    int32 train_steps;
+    // do test for every this number of training iterations, default is 0
+    int32 test_freq;
+    // run test for this number of steps/iterations, default is 0.
+    // The test dataset has test_steps * batchsize instances.
+    int32 test_steps;
+    // do checkpoint for every this number of training steps, default is 0
+    int32 checkpoint_freq;
+
+The pages of [checkpoint and restore](checkpoint.html) has details on 
checkpoint related fields.

Added: incubator/singa/site/trunk/content/markdown/v0.3.0/neural-net.md
URL: 
http://svn.apache.org/viewvc/incubator/singa/site/trunk/content/markdown/v0.3.0/neural-net.md?rev=1740048&view=auto
==============================================================================
--- incubator/singa/site/trunk/content/markdown/v0.3.0/neural-net.md (added)
+++ incubator/singa/site/trunk/content/markdown/v0.3.0/neural-net.md Wed Apr 20 
05:09:06 2016
@@ -0,0 +1,327 @@
+# Neural Net
+
+---
+
+`NeuralNet` in SINGA represents an instance of user's neural net model. As the
+neural net typically consists of a set of layers, `NeuralNet` comprises
+a set of unidirectionally connected [Layer](layer.html)s.
+This page describes how to convert an user's neural net into
+the configuration of `NeuralNet`.
+
+<img src="../images/model-category.png" align="center" width="200px"/>
+<span><strong>Figure 1 - Categorization of popular deep learning 
models.</strong></span>
+
+## Net structure configuration
+
+Users configure the `NeuralNet` by listing all layers of the neural net and
+specifying each layer's source layer names. Popular deep learning models can be
+categorized as Figure 1. The subsequent sections give details for each
+category.
+
+### Feed-forward models
+
+<div align = "left">
+<img src="../images/mlp-net.png" align="center" width="200px"/>
+<span><strong>Figure 2 - Net structure of a MLP model.</strong></span>
+</div>
+
+Feed-forward models, e.g., CNN and MLP, can easily get configured as their 
layer
+connections are undirected without circles. The
+configuration for the MLP model shown in Figure 1 is as follows,
+
+    net {
+      layer {
+        name : 'data"
+        type : kData
+      }
+      layer {
+        name : 'image"
+        type : kImage
+        srclayer: 'data'
+      }
+      layer {
+        name : 'label"
+        type : kLabel
+        srclayer: 'data'
+      }
+      layer {
+        name : 'hidden"
+        type : kHidden
+        srclayer: 'image'
+      }
+      layer {
+        name : 'softmax"
+        type : kSoftmaxLoss
+        srclayer: 'hidden'
+        srclayer: 'label'
+      }
+    }
+
+### Energy models
+
+<img src="../images/rbm-rnn.png" align="center" width="500px"/>
+<span><strong>Figure 3 - Convert connections in RBM and RNN.</strong></span>
+
+
+For energy models including RBM, DBM,
+etc., their connections are undirected (i.e., Category B). To represent these 
models using
+`NeuralNet`, users can simply replace each connection with two directed
+connections, as shown in Figure 3a. In other words, for each pair of connected 
layers, their source
+layer field should include each other's name.
+The full [RBM example](rbm.html) has
+detailed neural net configuration for a RBM model, which looks like
+
+    net {
+      layer {
+        name : "vis"
+        type : kVisLayer
+        param {
+          name : "w1"
+        }
+        srclayer: "hid"
+      }
+      layer {
+        name : "hid"
+        type : kHidLayer
+        param {
+          name : "w2"
+          share_from: "w1"
+        }
+        srclayer: "vis"
+      }
+    }
+
+### RNN models
+
+For recurrent neural networks (RNN), users can remove the recurrent connections
+by unrolling the recurrent layer.  For example, in Figure 3b, the original
+layer is unrolled into a new layer with 4 internal layers. In this way, the
+model is like a normal feed-forward model, thus can be configured similarly.
+The [RNN example](rnn.html) has a full neural net
+configuration for a RNN model.
+
+
+## Configuration for multiple nets
+
+Typically, a training job includes three neural nets for
+training, validation and test phase respectively. The three neural nets share 
most
+layers except the data layer, loss layer or output layer, etc..  To avoid
+redundant configurations for the shared layers, users can uses the `exclude`
+filed to filter a layer in the neural net, e.g., the following layer will be
+filtered when creating the testing `NeuralNet`.
+
+
+    layer {
+      ...
+      exclude : kTest # filter this layer for creating test net
+    }
+
+
+
+## Neural net partitioning
+
+A neural net can be partitioned in different ways to distribute the training
+over multiple workers.
+
+### Batch and feature dimension
+
+<img src="../images/partition_fc.png" align="center" width="400px"/>
+<span><strong>Figure 4 - Partitioning of a fully connected 
layer.</strong></span>
+
+
+Every layer's feature blob is considered a matrix whose rows are feature
+vectors. Thus, one layer can be split on two dimensions. Partitioning on
+dimension 0 (also called batch dimension) slices the feature matrix by rows.
+For instance, if the mini-batch size is 256 and the layer is partitioned into 2
+sub-layers, each sub-layer would have 128 feature vectors in its feature blob.
+Partitioning on this dimension has no effect on the parameters, as every
+[Param](param.html) object is replicated in the sub-layers. Partitioning on 
dimension
+1 (also called feature dimension) slices the feature matrix by columns. For
+example, suppose the original feature vector has 50 units, after partitioning
+into 2 sub-layers, each sub-layer would have 25 units. This partitioning may
+result in [Param](param.html) object being split, as shown in
+Figure 4. Both the bias vector and weight matrix are
+partitioned into two sub-layers.
+
+
+### Partitioning configuration
+
+There are 4 partitioning schemes, whose configurations are give below,
+
+  1. Partitioning each singe layer into sub-layers on batch dimension (see
+  below). It is enabled by configuring the partition dimension of the layer to
+  0, e.g.,
+
+          # with other fields omitted
+          layer {
+            partition_dim: 0
+          }
+
+  2. Partitioning each singe layer into sub-layers on feature dimension (see
+  below).  It is enabled by configuring the partition dimension of the layer to
+  1, e.g.,
+
+          # with other fields omitted
+          layer {
+            partition_dim: 1
+          }
+
+  3. Partitioning all layers into different subsets. It is enabled by
+  configuring the location ID of a layer, e.g.,
+
+          # with other fields omitted
+          layer {
+            location: 1
+          }
+          layer {
+            location: 0
+          }
+
+
+  4. Hybrid partitioning of strategy 1, 2 and 3. The hybrid partitioning is
+  useful for large models. An example application is to implement the
+  [idea proposed by Alex](http://arxiv.org/abs/1404.5997).
+  Hybrid partitioning is configured like,
+
+          # with other fields omitted
+          layer {
+            location: 1
+          }
+          layer {
+            location: 0
+          }
+          layer {
+            partition_dim: 0
+            location: 0
+          }
+          layer {
+            partition_dim: 1
+            location: 0
+          }
+
+Currently SINGA supports strategy-2 well. Other partitioning strategies are
+are under test and will be released in later version.
+
+## Parameter sharing
+
+Parameters can be shared in two cases,
+
+  * sharing parameters among layers via user configuration. For example, the
+  visible layer and hidden layer of a RBM shares the weight matrix, which is 
configured through
+  the `share_from` field as shown in the above RBM configuration. The
+  configurations must be the same (except name) for shared parameters.
+
+  * due to neural net partitioning, some `Param` objects are replicated into
+  different workers, e.g., partitioning one layer on batch dimension. These
+  workers share parameter values. SINGA controls this kind of parameter
+  sharing automatically, users do not need to do any configuration.
+
+  * the `NeuralNet` for training and testing (and validation) share most layers
+  , thus share `Param` values.
+
+If the shared `Param` instances resident in the same process (may in different
+threads), they use the same chunk of memory space for their values. But they
+would have different memory spaces for their gradients. In fact, their
+gradients will be averaged by the stub or server.
+
+## Advanced user guide
+
+### Creation
+
+    static NeuralNet* NeuralNet::Create(const NetProto& np, Phase phase, int 
num);
+
+The above function creates a `NeuralNet` for a given phase, and returns a
+pointer to the `NeuralNet` instance. The phase is in {kTrain,
+kValidation, kTest}. `num` is used for net partitioning which indicates the
+number of partitions.  Typically, a training job includes three neural nets for
+training, validation and test phase respectively. The three neural nets share 
most
+layers except the data layer, loss layer or output layer, etc.. The `Create`
+function takes in the full net configuration including layers for training,
+validation and test.  It removes layers for phases other than the specified
+phase based on the `exclude` field in
+[layer configuration](layer.html):
+
+    layer {
+      ...
+      exclude : kTest # filter this layer for creating test net
+    }
+
+The filtered net configuration is passed to the constructor of `NeuralNet`:
+
+    NeuralNet::NeuralNet(NetProto netproto, int npartitions);
+
+The constructor creates a graph representing the net structure firstly in
+
+    Graph* NeuralNet::CreateGraph(const NetProto& netproto, int npartitions);
+
+Next, it creates a layer for each node and connects layers if their nodes are
+connected.
+
+    void NeuralNet::CreateNetFromGraph(Graph* graph, int npartitions);
+
+Since the `NeuralNet` instance may be shared among multiple workers, the
+`Create` function returns a pointer to the `NeuralNet` instance .
+
+### Parameter sharing
+
+ `Param` sharing
+is enabled by first sharing the Param configuration (in `NeuralNet::Create`)
+to create two similar (e.g., the same shape) Param objects, and then calling
+(in `NeuralNet::CreateNetFromGraph`),
+
+    void Param::ShareFrom(const Param& from);
+
+It is also possible to share `Param`s of two nets, e.g., sharing parameters of
+the training net and the test net,
+
+    void NeuralNet:ShareParamsFrom(NeuralNet* other);
+
+It will call `Param::ShareFrom` for each Param object.
+
+### Access functions
+`NeuralNet` provides a couple of access function to get the layers and params
+of the net:
+
+    const std::vector<Layer*>& layers() const;
+    const std::vector<Param*>& params() const ;
+    Layer* name2layer(string name) const;
+    Param* paramid2param(int id) const;
+
+
+### Partitioning
+
+
+#### Implementation
+
+SINGA partitions the neural net in `CreateGraph` function, which creates one
+node for each (partitioned) layer. For example, if one layer's partition
+dimension is 0 or 1, then it creates `npartition` nodes for it; if the
+partition dimension is -1, a single node is created, i.e., no partitioning.
+Each node is assigned a partition (or location) ID. If the original layer is
+configured with a location ID, then the ID is assigned to each newly created 
node.
+These nodes are connected according to the connections of the original layers.
+Some connection layers will be added automatically.
+For instance, if two connected sub-layers are located at two
+different workers, then a pair of bridge layers is inserted to transfer the
+feature (and gradient) blob between them. When two layers are partitioned on
+different dimensions, a concatenation layer which concatenates feature rows (or
+columns) and a slice layer which slices feature rows (or columns) would be
+inserted. These connection layers help making the network communication and
+synchronization transparent to the users.
+
+#### Dispatching partitions to workers
+
+Each (partitioned) layer is assigned a location ID, based on which it is 
dispatched to one
+worker. Particularly, the pointer to the `NeuralNet` instance is passed
+to every worker within the same group, but each worker only computes over the
+layers that have the same partition (or location) ID as the worker's ID.  When
+every worker computes the gradients of the entire model parameters
+(strategy-2), we refer to this process as data parallelism.  When different
+workers compute the gradients of different parameters (strategy-3 or
+strategy-1), we call this process model parallelism.  The hybrid partitioning
+leads to hybrid parallelism where some workers compute the gradients of the
+same subset of model parameters while other workers compute on different model
+parameters.  For example, to implement the hybrid parallelism in for the
+[DCNN model](http://arxiv.org/abs/1404.5997), we set `partition_dim = 0` for
+lower layers and `partition_dim = 1` for higher layers.
+

Added: incubator/singa/site/trunk/content/markdown/v0.3.0/neuralnet-partition.md
URL: 
http://svn.apache.org/viewvc/incubator/singa/site/trunk/content/markdown/v0.3.0/neuralnet-partition.md?rev=1740048&view=auto
==============================================================================
--- incubator/singa/site/trunk/content/markdown/v0.3.0/neuralnet-partition.md 
(added)
+++ incubator/singa/site/trunk/content/markdown/v0.3.0/neuralnet-partition.md 
Wed Apr 20 05:09:06 2016
@@ -0,0 +1,54 @@
+# Neural Net Partition
+
+---
+
+The purposes of partitioning neural network is to distribute the partitions 
onto
+different working units (e.g., threads or nodes, called workers in this 
article)
+and parallelize the processing.
+Another reason for partition is to handle large neural network which cannot be
+hold in a single node. For instance, to train models against images with high
+resolution we need large neural networks (in terms of training parameters).
+
+Since *Layer* is the first class citizen in SIGNA, we do the partition against
+layers. Specifically, we support partitions at two levels. First, users can 
configure
+the location (i.e., worker ID) of each layer. In this way, users assign one 
worker
+for each layer. Secondly, for one layer, we can partition its neurons or 
partition
+the instances (e.g, images). They are called layer partition and data partition
+respectively. We illustrate the two types of partitions using an simple 
convolutional neural network.
+
+<img src="../images/conv-mnist.png" style="width: 220px"/>
+
+The above figure shows a convolutional neural network without any partition. It
+has 8 layers in total (one rectangular represents one layer). The first layer 
is
+DataLayer (data) which reads data from local disk files/databases (or HDFS). 
The second layer
+is a MnistLayer which parses the records from MNIST data to get the pixels of 
a batch
+of 8 images (each image is of size 28x28). The LabelLayer (label) parses the 
records to get the label
+of each image in the batch. The ConvolutionalLayer (conv1) transforms the 
input image to the
+shape of 8x27x27. The ReLULayer (relu1) conducts elementwise transformations. 
The PoolingLayer (pool1)
+sub-samples the images. The fc1 layer is fully connected with pool1 layer. It
+mulitplies each image with a weight matrix to generate a 10 dimension hidden 
feature which
+is then normalized by a SoftmaxLossLayer to get the prediction.
+
+<img src="../images/conv-mnist-datap.png" style="width: 1000px"/>
+
+The above figure shows the convolutional neural network after partitioning all 
layers
+except the DataLayer and ParserLayers, into 3 partitions using data partition.
+The read layers process 4 images of the batch, the black and blue layers 
process 2 images
+respectively. Some helper layers, i.e., SliceLayer, ConcateLayer, 
BridgeSrcLayer,
+BridgeDstLayer and SplitLayer, are added automatically by our partition 
algorithm.
+Layers of the same color resident in the same worker. There would be data 
transferring
+across different workers at the boundary layers (i.e., BridgeSrcLayer and 
BridgeDstLayer),
+e.g., between s-slice-mnist-conv1 and d-slice-mnist-conv1.
+
+<img src="../images/conv-mnist-layerp.png" style="width: 1000px"/>
+
+The above figure shows the convolutional neural network after partitioning all 
layers
+except the DataLayer and ParserLayers, into 2 partitions using layer 
partition. We can
+see that each layer processes all 8 images from the batch. But different 
partitions process
+different part of one image. For instance, the layer conv1-00 process only 4 
channels. The other
+4 channels are processed by conv1-01 which residents in another worker.
+
+
+Since the partition is done at the layer level, we can apply different 
partitions for
+different layers to get a hybrid partition for the whole neural network. 
Moreover,
+we can also specify the layer locations to locate different layers to 
different workers.

Added: incubator/singa/site/trunk/content/markdown/v0.3.0/overview.md
URL: 
http://svn.apache.org/viewvc/incubator/singa/site/trunk/content/markdown/v0.3.0/overview.md?rev=1740048&view=auto
==============================================================================
--- incubator/singa/site/trunk/content/markdown/v0.3.0/overview.md (added)
+++ incubator/singa/site/trunk/content/markdown/v0.3.0/overview.md Wed Apr 20 
05:09:06 2016
@@ -0,0 +1,93 @@
+# Introduction
+
+---
+
+SINGA is a general distributed deep learning platform for training big deep
+learning models over large datasets. It is designed with an intuitive
+programming model based on the layer abstraction. A variety
+of popular deep learning models are supported, namely feed-forward models 
including
+convolutional neural networks (CNN), energy models like restricted Boltzmann
+machine (RBM), and recurrent neural networks (RNN). Many built-in layers are
+provided for users. SINGA architecture is
+sufficiently flexible to run synchronous, asynchronous and hybrid training
+frameworks.  SINGA
+also supports different neural net partitioning schemes to parallelize the
+training of large models, namely partitioning on batch dimension, feature
+dimension or hybrid partitioning.
+
+
+## Goals
+
+As a distributed system, the first goal of SINGA is to have good scalability. 
In other
+words, SINGA is expected to reduce the total training time to achieve certain
+accuracy with more computing resources (i.e., machines).
+
+
+The second goal is to make SINGA easy to use.
+It is non-trivial for programmers to develop and train models with deep and
+complex model structures.  Distributed training further increases the burden of
+programmers, e.g., data and model partitioning, and network communication.  
Hence it is essential to
+provide an easy to use programming model so that users can implement their deep
+learning models/algorithms without much awareness of the underlying distributed
+platform.
+
+## Principles
+
+Scalability is a challenging research problem for distributed deep learning
+training. SINGA provides a general architecture to exploit the scalability of
+different training frameworks. Synchronous training frameworks improve the
+efficiency of one training iteration, and
+asynchronous training frameworks improve the convergence rate. Given a fixed 
budget
+(e.g., cluster size), users can run a hybrid framework that maximizes the
+scalability by trading off between efficiency and convergence rate.
+
+SINGA comes with a programming model designed based on the layer abstraction, 
which
+is intuitive for deep learning models.  A variety of
+popular deep learning models can be expressed and trained using this 
programming model.
+
+## System overview
+
+<img src="../images/sgd.png" align="center" width="400px"/>
+<span><strong>Figure 1 - SGD flow.</strong></span>
+
+Training a deep learning model is to find the optimal parameters involved in
+the transformation functions that generate good features for specific tasks.
+The goodness of a set of parameters is measured by a loss function, e.g.,
+[Cross-Entropy Loss](https://en.wikipedia.org/wiki/Cross_entropy). Since the
+loss functions are usually non-linear and non-convex, it is difficult to get a
+closed form solution. Typically, people use the stochastic gradient descent
+(SGD) algorithm, which randomly
+initializes the parameters and then iteratively updates them to reduce the loss
+as shown in Figure 1.
+
+<img src="../images/overview.png" align="center" width="400px"/>
+<span><strong>Figure 2 - SINGA overview.</strong></span>
+
+SGD is used in SINGA to train
+parameters of deep learning models. The training workload is distributed over
+worker and server units as shown in Figure 2. In each
+iteration, every worker calls *TrainOneBatch* function to compute
+parameter gradients. *TrainOneBatch* takes a *NeuralNet* object
+representing the neural net, and visits layers of the *NeuralNet* in
+certain order. The resultant gradients are sent to the local stub that
+aggregates the requests and forwards them to corresponding servers for
+updating. Servers reply to workers with the updated parameters for the next
+iteration.
+
+
+## Job submission
+
+To submit a job in SINGA (i.e., training a deep learning model),
+users pass the job configuration to SINGA driver in the
+[main function](programming-guide.html). The job configuration
+specifies the four major components in Figure 2,
+
+  * a [NeuralNet](neural-net.html) describing the neural net structure with 
the detailed layer setting and their connections;
+  * a [TrainOneBatch](train-one-batch.html) algorithm which is tailored for 
different model categories;
+  * an [Updater](updater.html) defining the protocol for updating parameters 
at the server side;
+  * a [Cluster Topology](distributed-training.html) specifying the distributed 
architecture of workers and servers.
+
+This process is like the job submission in Hadoop, where users configure their
+jobs in the main function to set the mapper, reducer, etc.
+In Hadoop, users can configure their jobs with their own (or built-in) mapper 
and reducer; in SINGA, users
+can configure their jobs with their own (or built-in) layer, updater, etc.

Added: incubator/singa/site/trunk/content/markdown/v0.3.0/param.md
URL: 
http://svn.apache.org/viewvc/incubator/singa/site/trunk/content/markdown/v0.3.0/param.md?rev=1740048&view=auto
==============================================================================
--- incubator/singa/site/trunk/content/markdown/v0.3.0/param.md (added)
+++ incubator/singa/site/trunk/content/markdown/v0.3.0/param.md Wed Apr 20 
05:09:06 2016
@@ -0,0 +1,226 @@
+# Parameters
+
+---
+
+A `Param` object in SINGA represents a set of parameters, e.g., a weight matrix
+or a bias vector. *Basic user guide* describes how to configure for a `Param`
+object, and *Advanced user guide* provides details on implementing users'
+parameter initialization methods.
+
+## Basic user guide
+
+The configuration of a Param object is inside a layer configuration, as the
+`Param` are associated with layers. An example configuration is like
+
+    layer {
+      ...
+      param {
+        name : "p1"
+        init {
+          type : kConstant
+          value: 1
+        }
+      }
+    }
+
+The [SGD algorithm](overview.html) starts with initializing all
+parameters according to user specified initialization method (the `init` 
field).
+For the above example,
+all parameters in `Param` "p1" will be initialized to constant value 1. The
+configuration fields of a Param object is defined in 
[ParamProto](../api/classsinga_1_1ParamProto.html):
+
+  * name, an identifier string. It is an optional field. If not provided, SINGA
+  will generate one based on layer name and its order in the layer.
+  * init, field for setting initialization methods.
+  * share_from, name of another `Param` object, from which this `Param` will 
share
+  configurations and values.
+  * lr_scale, float value to be multiplied with the learning rate when
+  [updating the parameters](updater.html)
+  * wd_scale, float value to be multiplied with the weight decay when
+  [updating the parameters](updater.html)
+
+There are some other fields that are specific to initialization methods.
+
+### Initialization methods
+
+Users can set the `type` of `init` use the following built-in initialization
+methods,
+
+  * `kConst`, set all parameters of the Param object to a constant value
+
+        type: kConst
+        value: float  # default is 1
+
+  * `kGaussian`, initialize the parameters following a Gaussian distribution.
+
+        type: kGaussian
+        mean: float # mean of the Gaussian distribution, default is 0
+        std: float # standard variance, default is 1
+        value: float # default 0
+
+  * `kUniform`, initialize the parameters following an uniform distribution
+
+        type: kUniform
+        low: float # lower boundary, default is -1
+        high: float # upper boundary, default is 1
+        value: float # default 0
+
+  * `kGaussianSqrtFanIn`, initialize `Param` objects with two dimensions (i.e.,
+  matrix) using `kGaussian` and then
+  multiple each parameter with `1/sqrt(fan_in)`, where`fan_in` is the number of
+  columns of the matrix.
+
+  * `kUniformSqrtFanIn`, the same as `kGaussianSqrtFanIn` except that the
+  distribution is an uniform distribution.
+
+  * `kUniformFanInOut`, initialize matrix `Param` objects using `kUniform` and 
then
+  multiple each parameter with `sqrt(6/(fan_in + fan_out))`, where`fan_in +
+  fan_out` sums up the number of columns and rows of the matrix.
+
+For all above initialization methods except `kConst`, if their `value` is not
+1, every parameter will be multiplied with `value`. Users can also implement
+their own initialization method following the *Advanced user guide*.
+
+
+## Advanced user guide
+
+This sections describes the details on implementing new parameter
+initialization methods.
+
+### Base ParamGenerator
+All initialization methods are implemented as
+subclasses of the base `ParamGenerator` class.
+
+    class ParamGenerator {
+     public:
+      virtual void Init(const ParamGenProto&);
+      void Fill(Param*);
+
+     protected:
+      ParamGenProto proto_;
+    };
+
+Configurations of the initialization method is in `ParamGenProto`. The `Fill`
+function fills the `Param` object (passed in as an argument).
+
+### New ParamGenerator subclass
+
+Similar to implement a new Layer subclass, users can define a configuration
+protocol message,
+
+    # in user.proto
+    message FooParamProto {
+      optional int32 x = 1;
+    }
+    extend ParamGenProto {
+      optional FooParamProto fooparam_conf =101;
+    }
+
+The configuration of `Param` would be
+
+    param {
+      ...
+      init {
+        user_type: 'FooParam" # must use user_type for user defined methods
+        [fooparam_conf] { # must use brackets for configuring user defined 
messages
+          x: 10
+        }
+      }
+    }
+
+The subclass could be declared as,
+
+    class FooParamGen : public ParamGenerator {
+     public:
+      void Fill(Param*) override;
+    };
+
+Users can access the configuration fields in `Fill` by
+
+    int x = proto_.GetExtension(fooparam_conf).x();
+
+To use the new initialization method, users need to register it in the
+[main function](programming-guide.html).
+
+    driver.RegisterParamGenerator<FooParamGen>("FooParam")  # must be 
consistent with the user_type in configuration
+
+{% comment %}
+### Base Param class
+
+### Members
+
+    int local_version_;
+    int slice_start_;
+    vector<int> slice_offset_, slice_size_;
+
+    shared_ptr<Blob<float>> data_;
+    Blob<float> grad_;
+    ParamProto proto_;
+
+Each Param object has a local version and a global version (inside the data
+Blob). These two versions are used for synchronization. If multiple Param
+objects share the same values, they would have the same `data_` field.
+Consequently, their global version is the same. The global version is updated
+by [the stub thread](communication.html). The local version is
+updated in `Worker::Update` function which assigns the global version to the
+local version. The `Worker::Collect` function is blocked until the global
+version is larger than the local version, i.e., when `data_` is updated. In
+this way, we synchronize workers sharing parameters.
+
+In Deep learning models, some Param objects are 100 times larger than others.
+To ensure the load-balance among servers, SINGA slices large Param objects. The
+slicing information is recorded by `slice_*`. Each slice is assigned a unique
+ID starting from 0. `slice_start_` is the ID of the first slice of this Param
+object. `slice_offset_[i]` is the offset of the i-th slice in this Param
+object. `slice_size_[i]` is the size of the i-th slice. These slice information
+is used to create messages for transferring parameter values or gradients to
+different servers.
+
+Each Param object has a `grad_` field for gradients. Param objects do not share
+this Blob although they may share `data_`.  Because each layer containing a
+Param object would contribute gradients. E.g., in RNN, the recurrent layers
+share parameters values, and the gradients used for updating are averaged from 
all recurrent
+these recurrent layers. In SINGA, the stub thread will aggregate local
+gradients for the same Param object. The server will do a global aggregation
+of gradients for the same Param object.
+
+The `proto_` field has some meta information, e.g., name and ID. It also has a
+field called `owner` which is the ID of the Param object that shares parameter
+values with others.
+
+### Functions
+The base Param class implements two sets of functions,
+
+    virtual void InitValues(int version = 0);  // initialize values according 
to `init_method`
+    void ShareFrom(const Param& other);  // share `data_` from `other` Param
+    --------------
+    virtual Msg* GenGetMsg(bool copy, int slice_idx);
+    virtual Msg* GenPutMsg(bool copy, int slice_idx);
+    ... // other message related functions.
+
+Besides the functions for processing the parameter values, there is a set of
+functions for generating and parsing messages. These messages are for
+transferring parameter values or gradients between workers and servers. Each
+message corresponds to one Param slice. If `copy` is false, it means the
+receiver of this message is in the same process as the sender. In such case,
+only pointers to the memory of parameter value (or gradient) are wrapped in
+the message; otherwise, the parameter values (or gradients) should be copied
+into the message.
+
+
+## Implementing Param subclass
+Users can extend the base Param class to implement their own parameter
+initialization methods and message transferring protocols. Similar to
+implementing a new Layer subclasses, users can create google protocol buffer
+messages for configuring the Param subclass. The subclass, denoted as FooParam
+should be registered in main.cc,
+
+    dirver.RegisterParam<FooParam>(kFooParam);  // kFooParam should be 
different to 0, which is for the base Param type
+
+
+  * type, an integer representing the `Param` type. Currently SINGA provides 
one
+    `Param` implementation with type 0 (the default type). If users want
+    to use their own Param implementation, they should extend the base Param
+    class and configure this field with `kUserParam`
+
+{% endcomment %}

Added: incubator/singa/site/trunk/content/markdown/v0.3.0/programming-guide.md
URL: 
http://svn.apache.org/viewvc/incubator/singa/site/trunk/content/markdown/v0.3.0/programming-guide.md?rev=1740048&view=auto
==============================================================================
--- incubator/singa/site/trunk/content/markdown/v0.3.0/programming-guide.md 
(added)
+++ incubator/singa/site/trunk/content/markdown/v0.3.0/programming-guide.md Wed 
Apr 20 05:09:06 2016
@@ -0,0 +1,95 @@
+# Programming Guide
+
+---
+
+To submit a training job, users must provide the configuration of the
+four components shown in Figure 1:
+
+  * a [NeuralNet](neural-net.html) describing the neural net structure with 
the detailed layer setting and their connections;
+  * a [TrainOneBatch](train-one-batch.html) algorithm which is tailored for 
different model categories;
+  * an [Updater](updater.html) defining the protocol for updating parameters 
at the server side;
+  * a [Cluster Topology](distributed-training.html) specifying the distributed 
architecture of workers and servers.
+
+The *Basic user guide* section describes how to submit a training job using
+built-in components; while the *Advanced user guide* section presents details
+on writing user's own main function to register components implemented by
+themselves. In addition, the training data must be prepared, which has the same
+[process](data.html) for both advanced users and basic users.
+
+<img src="../images/overview.png" align="center" width="400px"/>
+<span><strong>Figure 1 - SINGA overview.</strong></span>
+
+
+
+## Basic user guide
+
+Users can use the default main function provided SINGA to submit the training
+job. For this case, a job configuration file written as a google protocol
+buffer message for the [JobProto](../api/classsinga_1_1JobProto.html) must be 
provided in the command line,
+
+    ./bin/singa-run.sh -conf <path to job conf> [-resume]
+
+`-resume` is for continuing the training from last
+[checkpoint](checkpoint.html).
+The [MLP](mlp.html) and [CNN](cnn.html)
+examples use built-in components. Please read the corresponding pages for their
+job configuration files. The subsequent pages will illustrate the details on
+each component of the configuration.
+
+## Advanced user guide
+
+If a user's model contains some user-defined components, e.g.,
+[Updater](updater.html), he has to write a main function to
+register these components. It is similar to Hadoop's main function. Generally,
+the main function should
+
+  * initialize SINGA, e.g., setup logging.
+
+  * register user-defined components.
+
+  * create and pass the job configuration to SINGA driver
+
+
+An example main function is like
+
+    #include "singa.h"
+    #include "user.h"  // header for user code
+
+    int main(int argc, char** argv) {
+      singa::Driver driver;
+      driver.Init(argc, argv);
+      bool resume;
+      // parse resume option from argv.
+
+      // register user defined layers
+      driver.RegisterLayer<FooLayer>(kFooLayer);
+      // register user defined updater
+      driver.RegisterUpdater<FooUpdater>(kFooUpdater);
+      ...
+      auto jobConf = driver.job_conf();
+      //  update jobConf
+
+      driver.Train(resume, jobConf);
+      return 0;
+    }
+
+The Driver class' `Init` method will load a job configuration file provided by
+users as a command line argument (`-conf <job conf>`). It contains at least the
+cluster topology and returns the `jobConf` for users to update or fill in
+configurations of neural net, updater, etc. If users define subclasses of
+Layer, Updater, Worker and Param, they should register them through the driver.
+Finally, the job configuration is submitted to the driver which starts the
+training.
+
+We will provide helper functions to make the configuration easier in the
+future, like [keras](https://github.com/fchollet/keras).
+
+Users need to compile and link their code (e.g., layer implementations and the 
main
+file) with SINGA library (*.libs/libsinga.so*) to generate an
+executable file, e.g., with name *mysinga*.  To launch the program, users just 
pass the
+path of the *mysinga* and base job configuration to *./bin/singa-run.sh*.
+
+    ./bin/singa-run.sh -conf <path to job conf> -exec <path to mysinga> [other 
arguments]
+
+The [RNN application](rnn.html) provides a full example of
+implementing the main function for training a specific RNN model.

Added: incubator/singa/site/trunk/content/markdown/v0.3.0/python.md
URL: 
http://svn.apache.org/viewvc/incubator/singa/site/trunk/content/markdown/v0.3.0/python.md?rev=1740048&view=auto
==============================================================================
--- incubator/singa/site/trunk/content/markdown/v0.3.0/python.md (added)
+++ incubator/singa/site/trunk/content/markdown/v0.3.0/python.md Wed Apr 20 
05:09:06 2016
@@ -0,0 +1,374 @@
+# Python Binding
+
+---
+
+Python binding provides APIs for configuring a training job following
+[keras](http://keras.io/), including the configuration of neural net, training
+algorithm, etc.  It replaces the configuration file (e.g., *job.conf*) in
+protobuf format, which is typically long and error-prone to prepare. We will 
add
+python functions to interact with the layer and neural net
+objects (see [here](python_interactive_training.html)), which would enable 
users to train and debug their models
+interactively.
+
+Here is the layout of python related code,
+
+    SINGAROOT/tool/python
+    |-- pb2 (has job_pb2.py)
+    |-- singa
+        |-- model.py
+        |-- layer.py
+        |-- parameter.py
+        |-- initialization.py
+        |-- utils
+            |-- utility.py
+            |-- message.py
+    |-- examples
+        |-- cifar10_cnn.py, mnist_mlp.py, , mnist_rbm1.py, mnist_ae.py, etc.
+        |-- datasets
+            |-- cifar10.py
+            |-- mnist.py
+
+## Compiling and running instructions
+
+In order to use the Python APIs, users need to add the following arguments 
when compiling
+SINGA,
+
+    ./configure --enable-python --with-python=PYTHON_DIR
+    make
+
+where PYTHON_DIR has Python.h
+
+
+The training program is launched by
+
+    bin/singa-run.sh -exec <user_main.py>
+
+where user_main.py creates the JobProto object and passes it to Driver::Train 
to
+start the training.
+
+For example,
+
+    cd SINGAROOT
+    bin/singa-run.sh -exec tool/python/examples/cifar10_cnn.py
+
+
+
+## Examples
+
+
+### MLP Example
+
+This example uses python APIs to configure and train a MLP model over the MNIST
+dataset. The configuration content is the same as that written in 
*SINGAROOT/examples/mnist/job.conf*.
+
+```
+X_train, X_test, workspace = mnist.load_data()
+
+m = Sequential('mlp', sys.argv)
+
+m.add(Dense(2500, init='uniform', activation='stanh'))
+m.add(Dense(2000, init='uniform', activation='stanh'))
+m.add(Dense(1500, init='uniform', activation='stanh'))
+m.add(Dense(1000, init='uniform', activation='stanh'))
+m.add(Dense(500,  init='uniform', activation='stanh'))
+m.add(Dense(10, init='uniform', activation='softmax'))
+
+sgd = SGD(lr=0.001, lr_type='step')
+topo = Cluster(workspace)
+m.compile(loss='categorical_crossentropy', optimizer=sgd, cluster=topo)
+m.fit(X_train, nb_epoch=1000, with_test=True)
+result = m.evaluate(X_test, batch_size=100, test_steps=10, test_freq=60)
+```
+
+### CNN Example
+
+This example uses python APIs to configure and train a CNN model over the 
Cifar10
+dataset. The configuration content is the same as that written in 
*SINGAROOT/examples/cifar10/job.conf*.
+
+
+```
+X_train, X_test, workspace = cifar10.load_data()
+
+m = Sequential('cnn', sys.argv)
+
+m.add(Convolution2D(32, 5, 1, 2, w_std=0.0001, b_lr=2))
+m.add(MaxPooling2D(pool_size=(3,3), stride=2))
+m.add(Activation('relu'))
+m.add(LRN2D(3, alpha=0.00005, beta=0.75))
+
+m.add(Convolution2D(32, 5, 1, 2, b_lr=2))
+m.add(Activation('relu'))
+m.add(AvgPooling2D(pool_size=(3,3), stride=2))
+m.add(LRN2D(3, alpha=0.00005, beta=0.75))
+
+m.add(Convolution2D(64, 5, 1, 2))
+m.add(Activation('relu'))
+m.add(AvgPooling2D(pool_size=(3,3), stride=2))
+
+m.add(Dense(10, w_wd=250, b_lr=2, b_wd=0, activation='softmax'))
+
+sgd = SGD(decay=0.004, lr_type='manual', step=(0,60000,65000), 
step_lr=(0.001,0.0001,0.00001))
+topo = Cluster(workspace)
+m.compile(updater=sgd, cluster=topo)
+m.fit(X_train, nb_epoch=1000, with_test=True)
+result = m.evaluate(X_test, 1000, test_steps=30, test_freq=300)
+```
+
+
+### RBM Example
+
+This example uses python APIs to configure and train a RBM model over the MNIST
+dataset. The configuration content is the same as that written in 
*SINGAROOT/examples/rbm*.conf*.
+
+```
+rbmid = 3
+X_train, X_test, workspace = mnist.load_data(nb_rbm=rbmid)
+m = Energy('rbm'+str(rbmid), sys.argv)
+
+out_dim = [1000, 500, 250]
+m.add(RBM(out_dim, w_std=0.1, b_wd=0))
+
+sgd = SGD(lr=0.1, decay=0.0002, momentum=0.8)
+topo = Cluster(workspace)
+m.compile(optimizer=sgd, cluster=topo)
+m.fit(X_train, alg='cd', nb_epoch=6000)
+```
+
+### AutoEncoder Example
+This example uses python APIs to configure and train an autoencoder model over
+the MNIST dataset. The configuration content is the same as that written in
+*SINGAROOT/examples/autoencoder.conf*.
+
+
+```
+rbmid = 4
+X_train, X_test, workspace = mnist.load_data(nb_rbm=rbmid+1)
+m = Sequential('autoencoder', sys.argv)
+
+hid_dim = [1000, 500, 250, 30]
+m.add(Autoencoder(hid_dim, out_dim=784, activation='sigmoid', 
param_share=True))
+
+agd = AdaGrad(lr=0.01)
+topo = Cluster(workspace)
+m.compile(loss='mean_squared_error', optimizer=agd, cluster=topo)
+m.fit(X_train, alg='bp', nb_epoch=12200)
+```
+
+### To run SINGA on GPU
+
+Users need to set a list of gpu ids to `device` field in fit() or evaluate().
+The number of GPUs must be the same to the number of workers configured for
+cluster topology.
+
+
+```
+gpu_id = [0]
+m.fit(X_train, nb_epoch=100, with_test=True, device=gpu_id)
+```
+
+### TIPS
+
+Hidden layers for MLP can be configured as
+
+```
+for n in [2500, 2000, 1500, 1000, 500]:
+  m.add(Dense(n, init='uniform', activation='tanh'))
+m.add(Dense(10, init='uniform', activation='softmax'))
+```
+
+Activation layer can be specified separately
+
+```
+m.add(Dense(2500, init='uniform'))
+m.add(Activation('tanh'))
+```
+
+Users can explicitly specify hyper-parameters of weight and bias
+
+```
+par = Parameter(init='uniform', scale=0.05)
+m.add(Dense(2500, w_param=par, b_param=par, activation='tanh'))
+m.add(Dense(2000, w_param=par, b_param=par, activation='tanh'))
+m.add(Dense(1500, w_param=par, b_param=par, activation='tanh'))
+m.add(Dense(1000, w_param=par, b_param=par, activation='tanh'))
+m.add(Dense(500, w_param=par, b_param=par, activation='tanh'))
+m.add(Dense(10, w_param=par, b_param=par, activation='softmax'))
+```
+
+
+```
+parw = Parameter(init='gauss', std=0.0001)
+parb = Parameter(init='const', value=0)
+m.add(Convolution(32, 5, 1, 2, w_param=parw, b_param=parb, b_lr=2))
+m.add(MaxPooling2D(pool_size(3,3), stride=2))
+m.add(Activation('relu'))
+m.add(LRN2D(3, alpha=0.00005, beta=0.75))
+
+parw.update(std=0.01)
+m.add(Convolution(32, 5, 1, 2, w_param=parw, b_param=parb))
+m.add(Activation('relu'))
+m.add(AvgPooling2D(pool_size(3,3), stride=2))
+m.add(LRN2D(3, alpha=0.00005, beta=0.75))
+
+m.add(Convolution(64, 5, 1, 2, w_param=parw, b_param=parb, b_lr=1))
+m.add(Activation('relu'))
+m.add(AvgPooling2D(pool_size(3,3), stride=2))
+
+m.add(Dense(10, w_param=parw, w_wd=250, b_param=parb, b_lr=2, b_wd=0, 
activation='softmax'))
+```
+
+
+Data can be added in this way,
+
+```
+X_train, X_test = mnist.load_data()  // parameter values are set in load_data()
+m.fit(X_train, ...)                  // Data layer for training is added
+m.evaluate(X_test, ...)              // Data layer for testing is added
+```
+or this way,
+
+```
+X_train, X_test = mnist.load_data()  // parameter values are set in load_data()
+m.add(X_train)                       // explicitly add Data layer
+m.add(X_test)                        // explicitly add Data layer
+```
+
+
+```
+store = Store(path='train.bin', batch_size=64, ...)        // parameter values 
are set explicitly
+m.add(Data(load='recordinput', phase='train', conf=store)) // Data layer is 
added
+store = Store(path='test.bin', batch_size=100, ...)        // parameter values 
are set explicitly
+m.add(Data(load='recordinput', phase='test', conf=store))  // Data layer is 
added
+```
+
+
+### Cases to run SINGA
+
+(1) Run SINGA for training
+
+```
+m.fit(X_train, nb_epoch=1000)
+```
+
+(2) Run SINGA for training and validation
+
+```
+m.fit(X_train, validate_data=X_valid, nb_epoch=1000)
+```
+
+(3) Run SINGA for test while training
+
+```
+m.fit(X_train, nb_epoch=1000, with_test=True)
+result = m.evaluate(X_test, batch_size=100, test_steps=100)
+```
+
+(4) Run SINGA for test only
+Assume a checkpoint exists after training
+
+```
+result = m.evaluate(X_test, batch_size=100, 
checkpoint_path=workspace+'/checkpoint/step100-worker0')
+```
+
+
+## Implementation Details
+
+### Layer class (inherited)
+
+* Data
+* Dense
+* Activation
+* Convolution2D
+* MaxPooling2D
+* AvgPooling2D
+* LRN2D
+* Dropout
+* RBM
+* Autoencoder
+
+### Model class
+
+Model class has `jobconf` (JobProto) and `layers` (layer list)
+
+Methods in Model class
+
+* add
+       * add Layer into Model
+       * 2 subclasses: Sequential model and Energy model
+
+* compile
+       * set Updater (i.e., optimizer) and Cluster (i.e., topology) components
+
+* fit
+       * set Training data and parameter values for the training
+               * (optional) set Validatiaon data and parameter values
+       * set Train_one_batch component
+       * specify `with_test` field if a user wants to run SINGA with test data 
simultaneously.
+       * [TODO] recieve train/validation results, e.g., accuracy, loss, ppl, 
etc.
+
+* evaluate
+       * set Testing data and parameter values for the testing
+       * specify `checkpoint_path` field if a user want to run SINGA only for 
testing.
+       * [TODO] recieve test results, e.g., accuracy, loss, ppl, etc.
+
+### Results
+
+fit() and evaluate() return train/test results, a dictionary containing
+
+* [key]: step number
+* [value]: a list of dictionay
+       * 'acc' for accuracy
+       * 'loss' for loss
+       * 'ppl' for ppl
+       * 'se' for squred error
+
+
+### Parameter class
+
+Users need to set parameter and initial values. For example,
+
+* Parameter (fields in Param proto)
+       * lr = (float) // learning rate multiplier, used to scale the learning 
rate when updating parameters.
+       * wd = (float) // weight decay multiplier, used to scale the weight 
decay when updating parameters.
+
+* Parameter initialization (fields in ParamGen proto)
+       * init = (string) // one of the types, 'uniform', 'constant', 'gaussian'
+       * high = (float)  // for 'uniform'
+       * low = (float)   // for 'uniform'
+       * value = (float) // for 'constant'
+       * mean = (float)  // for 'gaussian'
+       * std = (float)   // for 'gaussian'
+
+* Weight (`w_param`) is 'gaussian' with mean=0, std=0.01 at default
+
+* Bias (`b_param`) is 'constant' with value=0 at default
+
+* How to update the parameter fields
+       * for updating Weight, put `w_` in front of field name
+       * for updating Bias, put `b_` in front of field name
+
+Several ways to set Parameter values
+
+```
+parw = Parameter(lr=2, wd=10, init='gaussian', std=0.1)
+parb = Parameter(lr=1, wd=0, init='constant', value=0)
+m.add(Convolution2D(10, w_param=parw, b_param=parb, ...)
+```
+
+```
+m.add(Dense(10, w_mean=1, w_std=0.1, w_lr=2, w_wd=10, ...)
+```
+
+```
+parw = Parameter(init='constant', mean=0)
+m.add(Dense(10, w_param=parw, w_lr=1, w_wd=1, b_value=1, ...)
+```
+
+### Other classes
+
+* Store
+* Algorithm
+* Updater
+* SGD
+* AdaGrad
+* Cluster

Added: 
incubator/singa/site/trunk/content/markdown/v0.3.0/python_interactive_training.md
URL: 
http://svn.apache.org/viewvc/incubator/singa/site/trunk/content/markdown/v0.3.0/python_interactive_training.md?rev=1740048&view=auto
==============================================================================
--- 
incubator/singa/site/trunk/content/markdown/v0.3.0/python_interactive_training.md
 (added)
+++ 
incubator/singa/site/trunk/content/markdown/v0.3.0/python_interactive_training.md
 Wed Apr 20 05:09:06 2016
@@ -0,0 +1,186 @@
+# Interactive Training using Python
+
+---
+
+`Layer` class ([layer.py](layer.py)) has the following methods for an 
interactive training.
+For the basic usage of Python binding features, please refer to 
[python.md](python.md).
+
+**ComputeFeature(self, \*srclys)**
+
+* This method creates and sets up singa::Layer and maintains its source 
layers, then call singa::Layer::ComputeFeature(...) for data transformation.
+
+       * `*srclys`: (an arbtrary number of) source layers
+
+**ComputeGradient(self)**
+
+* This method creates calls singa::Layer::ComputeGradient(...) for gradient 
computation.
+
+**GetParams(self)**
+
+* This method calls singa::Layer::GetParam() to retrieve parameter values of 
the layer. Currently, it returns weight and bias. Each parameter is a 2D numpy 
array.
+
+**SetParams(self, \*params)**
+
+* This method sets parameter values of the layer.
+       * `*params`: (an arbitrary number of) parameters, each of which is a 2D 
numpy array. Typically, it sets weight and bias, 2D numpy array.
+
+* * *
+
+`Dummy` class is a subclass of `Layer`, which is provided to fetch input data 
and/or label information.
+Specifically, it creates singa::DummyLayer.
+
+**Feed(self, shape, data, aux_data)**
+
+* This method sets input data and/or auxiary data such as labels.
+
+       * `shape`: the shape (width and height) of dataset
+       * `data`: input dataset
+       * `aux_data`: auxiary dataset (e.g., labels)
+
+In addition, `Dummy` class has two subclasses named `ImageInput` and 
`LabelInput`.
+
+* `ImageInput` class will take three arguments as follows.
+
+       **\_\_init__(self, height=None, width=None, nb_channel=1)**
+
+* Both `ImageInput` and `LabelInput` classes have their own Feed method to 
call Feed of Dummy class.
+
+       **Feed(self, data)**
+
+
+<!--
+
+Users can save or load model parameter (e.g., weight and bias) at anytime 
during training.
+The following methods are provided in `model.py`.
+
+**save_model_parameter(step, fout, neuralnet)**
+
+* This method saves model parameters into the specified checkpoint (fout).
+
+       * `step`: the step id of training
+       * `fout`: the name of checkpoint (output filename)
+       * `neuralnet`: neural network model, i.e., a list of layers
+
+**load_model_parameter(fin, neuralnet, batchsize=1, data_shape=None)**
+
+* This method loads model parameters from the specified checkpoint (fin).
+
+       * `fin`: the name of checkpoint (input filename)
+       * `neuralnet`: neural network model, i.e., a list of layers
+       * `batchsize`:
+       * `data_shape`:
+-->
+
+* * *
+
+## Example scripts for the interactive training
+
+Two example scripts are provided at [`train_mnist.py`]() and 
[`train_cifar10.py`](), one is training MLP model for MNIST dataset, and 
another is training CNN model for CIFAR10 dataset.
+
+* Assume that `nn` is a neural network model, i.e., a list of layers. 
Currently, this examples considers sequential models. Example MLP and CNN are 
shown below.
+
+* `load_dataset()` method loads input data and corresponding labels, each of 
which is a 2D numpy array.
+For example, loading MNIST dataset returns x: [60000 x 784] and y: [60000 x 
1]. Loading CIFAR10 dataset, x: [10000 x 3072] and y: [10000 x 1].
+
+* `sgd` is an Updater instance. Please see [`python.md`](python.md) and 
[`model.py`]() for more details.
+
+#### Basic steps for the interactive training
+
+* Step 1: Prepare batchsized data and corresponding label information, and 
then input the data using `Feed()` method.
+
+* Step 2: (a) Transform data according to neuralnet (nn) structure using 
`ComputeFeature()`. Note that this example considers a sequential model, so it 
uses a simple loop. (b) Users need to provide `label` information for loss 
layer to compute loss function. (c) Users can print out the training 
performance, e.g., loss and accuracy.
+
+* Step 3: Compute gradient in a reverse order of neuralnet (nn) structure 
using `ComputeGradient()`.
+
+* Step 4: Update parameters, e.g., weight and bias, of layers using `Update()` 
of the updater.
+
+Here is an example script for the interactive training.
+```
+bsize = 64      # batchsize
+disp_freq = 10  # step to show the training accuracy
+
+x, y = load_dataset()
+
+for i in range(x.shape[0] / bsize):
+
+       # (Step1) Input data containing "bsize" samples
+       xb, yb = x[i*bsize:(i+1)*bsize, :], y[i*bsize:(i+1)*bsize, :]
+       nn[0].Feed(xb)
+       label.Feed(yb)
+
+       # (Step2-a) Transform data according to the neuralnet (nn) structure
+       for h in range(1, len(nn)):
+               nn[h].ComputeFeature(nn[h-1])
+
+       # (Step2-b) Provide label to compute loss function
+       loss.ComputeFeature(nn[-1], label)
+
+       # (Step2-c) Print out performance, e.g., loss and accuracy
+       if (i+1) % disp_freq == 0:
+               print '  Step {:>3}: '.format(i+1),
+               loss.display()
+
+       # (Step3) Compute gradient in a reverse order
+       loss.ComputeGradient()
+       for h in range(len(nn)-1, 0, -1):
+               nn[h].ComputeGradient()
+               # (Step 4) Update parameter
+               sgd.Update(i+1, nn[h])
+```        
+
+<a id="model"></a>
+### <a href="#model">Example MLP</a>  
+
+Here is an example MLP model with 5 fully-connected hidden layers.
+Please refer to [`python.md`](python.md) and [`layer.py`]() for more details 
about layer definition. `SGD()` is an updater defined in [`model.py`]().
+
+```
+input = ImageInput(28, 28) # image width and height
+label = LabelInput()
+
+nn = []
+nn.append(input)
+nn.append(Dense(2500, init='uniform'))
+nn.append(Activation('stanh'))
+nn.append(Dense(2000, init='uniform'))
+nn.append(Activation('stanh'))
+nn.append(Dense(1500, init='uniform'))
+nn.append(Activation('stanh'))
+nn.append(Dense(1000, init='uniform'))
+nn.append(Activation('stanh'))
+nn.append(Dense(500, init='uniform'))
+nn.append(Activation('stanh'))
+nn.append(Dense(10, init='uniform'))
+loss = Loss('softmaxloss')
+
+sgd = SGD(lr=0.001, lr_type='step')
+
+```
+
+### <a href="#model2">Example CNN</a>  
+
+Here is an example MLP model with 3 convolution and pooling layers.
+Please refer to [`python.md`]() and [`layer.py`]() for more details about 
layer definition. `SGD()` is an updater defined in [`model.py`]().
+
+```
+input = ImageInput(32, 32, 3) # image width, height, channel
+label = LabelInput()
+
+nn = []
+nn.append(input)
+nn.append(Convolution2D(32, 5, 1, 2, w_std=0.0001, b_lr=2))
+nn.append(MaxPooling2D(pool_size=(3,3), stride=2))
+nn.append(Activation('relu'))
+nn.append(LRN2D(3, alpha=0.00005, beta=0.75))
+nn.append(Convolution2D(32, 5, 1, 2, b_lr=2))
+nn.append(Activation('relu'))
+nn.append(AvgPooling2D(pool_size=(3,3), stride=2))
+nn.append(LRN2D(3, alpha=0.00005, beta=0.75))
+nn.append(Convolution2D(64, 5, 1, 2))
+nn.append(Activation('relu'))
+nn.append(AvgPooling2D(pool_size=(3,3), stride=2))
+nn.append(Dense(10, w_wd=250, b_lr=2, b_wd=0))
+loss = Loss('softmaxloss')
+
+sgd = SGD(decay=0.004, momentum=0.9, lr_type='manual', step=(0,60000,65000), 
step_lr=(0.001,0.0001,0.00001))
+```

Added: incubator/singa/site/trunk/content/markdown/v0.3.0/quick-start.md
URL: 
http://svn.apache.org/viewvc/incubator/singa/site/trunk/content/markdown/v0.3.0/quick-start.md?rev=1740048&view=auto
==============================================================================
--- incubator/singa/site/trunk/content/markdown/v0.3.0/quick-start.md (added)
+++ incubator/singa/site/trunk/content/markdown/v0.3.0/quick-start.md Wed Apr 
20 05:09:06 2016
@@ -0,0 +1,180 @@
+# Quick Start
+
+---
+
+## SINGA setup
+
+Please refer to the [installation](installation.html) page for guidance on 
installing SINGA.
+
+
+### Training on a single node
+
+For single node training, one process will be launched to run SINGA at
+local host. We train the [CNN 
model](http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks)
 over the
+[CIFAR-10](http://www.cs.toronto.edu/~kriz/cifar.html) dataset as an example.
+The hyper-parameters are set following
+[cuda-convnet](https://code.google.com/p/cuda-convnet/). More details is
+available at [CNN example](cnn.html).
+
+
+#### Preparing data and job configuration
+
+Download the dataset and create the data shards for training and testing.
+
+    cd examples/cifar10/
+    cp Makefile.example Makefile
+    make download
+    make create
+
+A training dataset and a test dataset are created respectively. An 
*image_mean.bin* file is also
+generated, which contains the feature mean of all images.
+
+Since all code used for training this CNN model is provided by SINGA as
+built-in implementation, there is no need to write any code. Instead, users 
just
+execute the running script by providing the job
+configuration file (*job.conf*). To code in SINGA, please refer to the
+[programming guide](programming-guide.html).
+
+#### Training without parallelism
+
+By default, the cluster topology has a single worker and a single server.
+In other words, neither the training data nor the neural net is partitioned.
+
+The training is started by running:
+
+    # goto top level folder
+    cd ../../
+    ./singa -conf examples/cifar10/job.conf
+
+#### Asynchronous parallel training
+
+    # job.conf
+    ...
+    cluster {
+      nworker_groups: 2
+      nworkers_per_procs: 2
+      workspace: "examples/cifar10/"
+    }
+
+In SINGA, [asynchronous training](architecture.html) is enabled by launching
+multiple worker groups. For example, we can change the original *job.conf* to
+have two worker groups as shown above. By default, each worker group has one
+worker. Since one process is set to contain two workers.  The two worker groups
+will run in the same process.  Consequently, they run the in-memory
+[Downpour](frameworks.html) training framework.  Users do not need to split the
+dataset explicitly for each worker (group); instead, they can assign each
+worker (group) a random offset to the start of the dataset. The workers would
+run as on different data partitions.
+
+    # job.conf
+    ...
+    neuralnet {
+      layer {
+        ...
+        store_conf {
+          random_skip: 5000
+        }
+      }
+      ...
+    }
+
+The running command is:
+
+    ./singa -conf examples/cifar10/job.conf
+
+#### Synchronous parallel training
+
+    # job.conf
+    ...
+    cluster {
+      nworkers_per_group: 2
+      nworkers_per_procs: 2
+      workspace: "examples/cifar10/"
+    }
+
+In SINGA, [asynchronous training](architecture.html) is enabled
+by launching multiple workers within one worker group. For instance, we can
+change the original *job.conf* to have two workers in one worker group as shown
+above. The workers will run synchronously
+as they are from the same worker group. This framework is the in-memory
+[sandblaster](frameworks.html).
+The model is partitioned among the two workers. In specific, each layer is
+sliced over the two workers.  The sliced layer
+is the same as the original layer except that it only has `B/g` feature
+instances, where `B` is the number of instances in a mini-batch, `g` is the 
number of
+workers in a group. It is also possible to partition the layer (or neural net)
+using [other schemes](neural-net.html).
+All other settings are the same as running without partitioning
+
+    ./singa -conf examples/cifar10/job.conf
+
+
+### Training in a cluster
+
+#### Starting Zookeeper
+
+SINGA uses [zookeeper](https://zookeeper.apache.org/) to coordinate the
+training, and uses ZeroMQ for transferring messages. After installing zookeeper
+and ZeroMQ, you need to configure SINGA with `--enable-dist` before compiling.
+Please make sure the zookeeper service is started before running SINGA.
+
+If you installed the zookeeper using our thirdparty script, you can
+simply start it by:
+
+    #goto top level folder
+    cd  SINGA_ROOT
+    ./bin/zk-service.sh start
+
+(`./bin/zk-service.sh stop` stops the zookeeper).
+
+Otherwise, if you launched a zookeeper by yourself but not used the
+default port, please edit the `conf/singa.conf`:
+
+    zookeeper_host: "localhost:YOUR_PORT"
+
+
+We can extend the above two training frameworks to a cluster by updating the
+cluster configuration with:
+
+    nworker_per_procs: 1
+
+Every process would then create only one worker thread. Consequently, the 
workers
+would be created in different processes (i.e., nodes). The *hostfile*
+must be provided under *SINGA_ROOT/conf/* specifying the nodes in the cluster,
+e.g.,
+
+    192.168.0.1
+    192.168.0.2
+
+And the zookeeper location must be configured correctly, e.g.,
+
+    #conf/singa.conf
+    zookeeper_host: "logbase-a01"
+
+The running command is :
+
+    ./bin/singa-run.sh -conf examples/cifar10/job.conf
+
+You can list the current running jobs by,
+
+    ./bin/singa-console.sh list
+
+    JOB ID    |NUM PROCS
+    ----------|-----------
+    24        |2
+
+Jobs can be killed by,
+
+    ./bin/singa-console.sh kill JOB_ID
+
+
+Logs and job information are available in */tmp/singa-log* folder, which can be
+changed to other folders by setting `log-dir` in *conf/singa.conf*.
+
+### Training with GPUs
+Please refer to the [GPU page][gpu.html] for details on training using GPUs.
+
+## Where to go next
+
+The [programming guide](programming-guide.html) pages will
+describe how to submit a training job in SINGA.


Reply via email to