svn commit: r1740048 [1/10] - in /incubator/singa/site/trunk/content/markdown: ./ develop/ docs/ docs/kr/ v0.3.0/ v0.3.0/jp/ v0.3.0/kr/ v0.3.0/zh/

wangwei Tue, 19 Apr 2016 22:09:47 -0700

Author: wangwei
Date: Wed Apr 20 05:09:06 2016
New Revision: 1740048

URL: http://svn.apache.org/viewvc?rev=1740048&view=rev
Log:
update docs for v0.3;


Added:
    
incubator/singa/site/trunk/content/markdown/docs/python_interactive_training.md
    incubator/singa/site/trunk/content/markdown/v0.3.0/
    incubator/singa/site/trunk/content/markdown/v0.3.0/architecture.md
    incubator/singa/site/trunk/content/markdown/v0.3.0/checkpoint.md
    incubator/singa/site/trunk/content/markdown/v0.3.0/cnn.md
    incubator/singa/site/trunk/content/markdown/v0.3.0/code-structure.md
    incubator/singa/site/trunk/content/markdown/v0.3.0/communication.md
    incubator/singa/site/trunk/content/markdown/v0.3.0/data.md
    incubator/singa/site/trunk/content/markdown/v0.3.0/debug.md
    incubator/singa/site/trunk/content/markdown/v0.3.0/distributed-training.md
    incubator/singa/site/trunk/content/markdown/v0.3.0/docker.md
    incubator/singa/site/trunk/content/markdown/v0.3.0/examples.md
    incubator/singa/site/trunk/content/markdown/v0.3.0/frameworks.md
    incubator/singa/site/trunk/content/markdown/v0.3.0/general-rnn.md
    incubator/singa/site/trunk/content/markdown/v0.3.0/gpu.md
    incubator/singa/site/trunk/content/markdown/v0.3.0/hdfs.md
    incubator/singa/site/trunk/content/markdown/v0.3.0/hybrid.md
    incubator/singa/site/trunk/content/markdown/v0.3.0/index.md
    incubator/singa/site/trunk/content/markdown/v0.3.0/installation.md
    incubator/singa/site/trunk/content/markdown/v0.3.0/installation_source.md
    incubator/singa/site/trunk/content/markdown/v0.3.0/jp/
    incubator/singa/site/trunk/content/markdown/v0.3.0/jp/architecture.md
    incubator/singa/site/trunk/content/markdown/v0.3.0/jp/checkpoint.md
    incubator/singa/site/trunk/content/markdown/v0.3.0/jp/cnn.md
    incubator/singa/site/trunk/content/markdown/v0.3.0/jp/code-structure.md
    incubator/singa/site/trunk/content/markdown/v0.3.0/jp/communication.md
    incubator/singa/site/trunk/content/markdown/v0.3.0/jp/data.md
    incubator/singa/site/trunk/content/markdown/v0.3.0/jp/debug.md
    
incubator/singa/site/trunk/content/markdown/v0.3.0/jp/distributed-training.md
    incubator/singa/site/trunk/content/markdown/v0.3.0/jp/docker.md
    incubator/singa/site/trunk/content/markdown/v0.3.0/jp/examples.md
    incubator/singa/site/trunk/content/markdown/v0.3.0/jp/frameworks.md
    incubator/singa/site/trunk/content/markdown/v0.3.0/jp/index.md
    incubator/singa/site/trunk/content/markdown/v0.3.0/jp/installation.md
    incubator/singa/site/trunk/content/markdown/v0.3.0/jp/installation_source.md
    incubator/singa/site/trunk/content/markdown/v0.3.0/jp/layer.md
    incubator/singa/site/trunk/content/markdown/v0.3.0/jp/mesos.md
    incubator/singa/site/trunk/content/markdown/v0.3.0/jp/mlp.md
    incubator/singa/site/trunk/content/markdown/v0.3.0/jp/model-config.md
    incubator/singa/site/trunk/content/markdown/v0.3.0/jp/neural-net.md
    incubator/singa/site/trunk/content/markdown/v0.3.0/jp/neuralnet-partition.md
    incubator/singa/site/trunk/content/markdown/v0.3.0/jp/overview.md
    incubator/singa/site/trunk/content/markdown/v0.3.0/jp/param.md
    incubator/singa/site/trunk/content/markdown/v0.3.0/jp/programmer-guide.md
    incubator/singa/site/trunk/content/markdown/v0.3.0/jp/programming-guide.md
    incubator/singa/site/trunk/content/markdown/v0.3.0/jp/quick-start.md
    incubator/singa/site/trunk/content/markdown/v0.3.0/jp/rbm.md
    incubator/singa/site/trunk/content/markdown/v0.3.0/jp/rnn.md
    incubator/singa/site/trunk/content/markdown/v0.3.0/jp/test.md
    incubator/singa/site/trunk/content/markdown/v0.3.0/jp/train-one-batch.md
    incubator/singa/site/trunk/content/markdown/v0.3.0/jp/updater.md
    incubator/singa/site/trunk/content/markdown/v0.3.0/kr/
    incubator/singa/site/trunk/content/markdown/v0.3.0/kr/architecture.md
    incubator/singa/site/trunk/content/markdown/v0.3.0/kr/checkpoint.md
    incubator/singa/site/trunk/content/markdown/v0.3.0/kr/cnn.md
    incubator/singa/site/trunk/content/markdown/v0.3.0/kr/code-structure.md
    incubator/singa/site/trunk/content/markdown/v0.3.0/kr/communication.md
    incubator/singa/site/trunk/content/markdown/v0.3.0/kr/data.md
    incubator/singa/site/trunk/content/markdown/v0.3.0/kr/debug.md
    
incubator/singa/site/trunk/content/markdown/v0.3.0/kr/distributed-training.md
    incubator/singa/site/trunk/content/markdown/v0.3.0/kr/docker.md
    incubator/singa/site/trunk/content/markdown/v0.3.0/kr/examples.md
    incubator/singa/site/trunk/content/markdown/v0.3.0/kr/frameworks.md
    incubator/singa/site/trunk/content/markdown/v0.3.0/kr/index.md
    incubator/singa/site/trunk/content/markdown/v0.3.0/kr/installation.md
    incubator/singa/site/trunk/content/markdown/v0.3.0/kr/installation_source.md
    incubator/singa/site/trunk/content/markdown/v0.3.0/kr/layer.md
    incubator/singa/site/trunk/content/markdown/v0.3.0/kr/mesos.md
    incubator/singa/site/trunk/content/markdown/v0.3.0/kr/mlp.md
    incubator/singa/site/trunk/content/markdown/v0.3.0/kr/model-config.md
    incubator/singa/site/trunk/content/markdown/v0.3.0/kr/neural-net.md
    incubator/singa/site/trunk/content/markdown/v0.3.0/kr/neuralnet-partition.md
    incubator/singa/site/trunk/content/markdown/v0.3.0/kr/overview.md
    incubator/singa/site/trunk/content/markdown/v0.3.0/kr/param.md
    incubator/singa/site/trunk/content/markdown/v0.3.0/kr/programmer-guide.md
    incubator/singa/site/trunk/content/markdown/v0.3.0/kr/programming-guide.md
    incubator/singa/site/trunk/content/markdown/v0.3.0/kr/quick-start.md
    incubator/singa/site/trunk/content/markdown/v0.3.0/kr/rbm.md
    incubator/singa/site/trunk/content/markdown/v0.3.0/kr/rnn.md
    incubator/singa/site/trunk/content/markdown/v0.3.0/kr/test.md
    incubator/singa/site/trunk/content/markdown/v0.3.0/kr/train-one-batch.md
    incubator/singa/site/trunk/content/markdown/v0.3.0/kr/updater.md
    incubator/singa/site/trunk/content/markdown/v0.3.0/layer.md
    incubator/singa/site/trunk/content/markdown/v0.3.0/mesos.md
    incubator/singa/site/trunk/content/markdown/v0.3.0/mlp.md
    incubator/singa/site/trunk/content/markdown/v0.3.0/model-config.md
    incubator/singa/site/trunk/content/markdown/v0.3.0/neural-net.md
    incubator/singa/site/trunk/content/markdown/v0.3.0/neuralnet-partition.md
    incubator/singa/site/trunk/content/markdown/v0.3.0/overview.md
    incubator/singa/site/trunk/content/markdown/v0.3.0/param.md
    incubator/singa/site/trunk/content/markdown/v0.3.0/programming-guide.md
    incubator/singa/site/trunk/content/markdown/v0.3.0/python.md
    
incubator/singa/site/trunk/content/markdown/v0.3.0/python_interactive_training.md
    incubator/singa/site/trunk/content/markdown/v0.3.0/quick-start.md
    incubator/singa/site/trunk/content/markdown/v0.3.0/rbm.md
    incubator/singa/site/trunk/content/markdown/v0.3.0/rnn.md
    incubator/singa/site/trunk/content/markdown/v0.3.0/test.md
    incubator/singa/site/trunk/content/markdown/v0.3.0/train-one-batch.md
    incubator/singa/site/trunk/content/markdown/v0.3.0/updater.md
    incubator/singa/site/trunk/content/markdown/v0.3.0/zh/
    incubator/singa/site/trunk/content/markdown/v0.3.0/zh/checkpoint.md
    incubator/singa/site/trunk/content/markdown/v0.3.0/zh/cnn.md
    incubator/singa/site/trunk/content/markdown/v0.3.0/zh/data.md
    
incubator/singa/site/trunk/content/markdown/v0.3.0/zh/distributed-training.md
    incubator/singa/site/trunk/content/markdown/v0.3.0/zh/index.md
    incubator/singa/site/trunk/content/markdown/v0.3.0/zh/installation_source.md
    incubator/singa/site/trunk/content/markdown/v0.3.0/zh/mlp.md
    incubator/singa/site/trunk/content/markdown/v0.3.0/zh/neural-net.md
    incubator/singa/site/trunk/content/markdown/v0.3.0/zh/overview.md   (with 
props)
    incubator/singa/site/trunk/content/markdown/v0.3.0/zh/programming-guide.md
    incubator/singa/site/trunk/content/markdown/v0.3.0/zh/rnn.md
    incubator/singa/site/trunk/content/markdown/v0.3.0/zh/train-one-batch.md
    incubator/singa/site/trunk/content/markdown/v0.3.0/zh/updater.md
Modified:
    incubator/singa/site/trunk/content/markdown/develop/schedule.md
    incubator/singa/site/trunk/content/markdown/docs/gpu.md
    incubator/singa/site/trunk/content/markdown/docs/kr/programming-guide.md
    incubator/singa/site/trunk/content/markdown/docs/python.md
    incubator/singa/site/trunk/content/markdown/docs/updater.md
    incubator/singa/site/trunk/content/markdown/downloads.md
    incubator/singa/site/trunk/content/markdown/index.md

Modified: incubator/singa/site/trunk/content/markdown/develop/schedule.md
URL: 
http://svn.apache.org/viewvc/incubator/singa/site/trunk/content/markdown/develop/schedule.md?rev=1740048&r1=1740047&r2=1740048&view=diff
==============================================================================
--- incubator/singa/site/trunk/content/markdown/develop/schedule.md (original)
+++ incubator/singa/site/trunk/content/markdown/develop/schedule.md Wed Apr 20 
05:09:06 2016
@@ -22,11 +22,12 @@
 |         |               |2.6. Visualization of neural net and debug 
information |done|
 |         | Binding        |2.7. Python binding for major components |done|
 |         | GPU            |2.8. Single node with multiple GPUs |done|
-|0.3 Mar 2016 | GPU | 3.1 Multiple nodes, each with multiple GPUs||
-|        |     | 3.2 Heterogeneous training using both GPU and CPU 
[CcT](http://arxiv.org/abs/1504.04343)||
-|         | Tools| 3.3 Deep learning as a service ||
-|         | Binding| 3.4 Enhance Python binding for training||
-|         |  | 3.5 Add R binding||
-|         | Applications | 3.6 Image classification, product search, etc.||
-|         | Optimization | 3.7  ||
+|0.3 April 2016 | GPU | 3.1 Multiple nodes, each with multiple GPUs|done|
+|               |     | 3.2 Heterogeneous training using both GPU and CPU 
[CcT](http://arxiv.org/abs/1504.04343)|done|
+|               |     | 3.3 Support cuDNN v4 | done|
+|               | Installation| 3.4 Remove dependency on ZeroMQ, CZMQ, 
Zookeeper for single node training|done|
+|               | Updater| 3.5 Add new SGD updaters including Adam, AdamMax 
and AdaDelta|done|
+|               | Binding| 3.6 Enhance Python binding for training|done|
+|0.4 July 2016  | Rafiki | 4.1 Deep learning as a service| |
+|               |        | 4.2 Product search using Rafiki| |
 

Modified: incubator/singa/site/trunk/content/markdown/docs/gpu.md
URL: 
http://svn.apache.org/viewvc/incubator/singa/site/trunk/content/markdown/docs/gpu.md?rev=1740048&r1=1740047&r2=1740048&view=diff
==============================================================================
--- incubator/singa/site/trunk/content/markdown/docs/gpu.md (original)
+++ incubator/singa/site/trunk/content/markdown/docs/gpu.md Wed Apr 20 05:09:06 
2016
@@ -21,7 +21,7 @@ provided by Nvidia, you need to enable C
 
     ./configure --enable-cuda --with-cuda=<path to cuda folder> --enable-cudnn 
--with-cudnn=<path to cudnn folder>
 
-SINGA now supports CUDNN V3.0.
+SINGA now supports CUDNN V3 and V4.
 
 
 ### Configuration

Modified: 
incubator/singa/site/trunk/content/markdown/docs/kr/programming-guide.md
URL: 
http://svn.apache.org/viewvc/incubator/singa/site/trunk/content/markdown/docs/kr/programming-guide.md?rev=1740048&r1=1740047&r2=1740048&view=diff
==============================================================================
--- incubator/singa/site/trunk/content/markdown/docs/kr/programming-guide.md 
(original)
+++ incubator/singa/site/trunk/content/markdown/docs/kr/programming-guide.md 
Wed Apr 20 05:09:06 2016
@@ -23,8 +23,8 @@ SINGA ìì ì¤ë¹ë ma
 
     ./bin/singa-run.sh -conf <path to job conf> [-resume]
 
-`-resume` ë í¸ë ì´ëì ì ë² [checkpoint](checkpoint.html) ë¶í° 
ë¤ì ê³ìí ë ì°ë ì¸ì ìëë¤.
-[MLP](mlp.html) ì [CNN](cnn.html) ìíë¤ì built-in ì»´í¬ëí¸ë¥¼ 
ì°ê³  ììµëë¤.
+`-resume` ë ì ë² [checkpoint](checkpoint.html) ë¶í° ë¤ì í¸ë 
ì´ëì ê³ìí ë ì°ë ì¸ì ìëë¤.
+[MLP](mlp.html) ì [CNN](cnn.html) ìíë¤ì built-in ì»´í¬ëí¸ë¥¼ 
ì´ì©íê³  ììµëë¤.
 Please read the corresponding pages for their job configuration files. The 
subsequent pages will illustrate the details on each component of the 
configuration.
 
 ## Advanced ì ì  ê°ì´ë
@@ -67,13 +67,10 @@ Driver class' `Init` method ë ì»�
 ì ì ê° Layer, Updater, Worker, Param ë±ì subclassë¥¼ ì ìíë©´, 
driver ì ë±ë¡ì í´ì¼í©ëë¤.
 í¸ë ì´ëì ììíê¸° ìíì¬ job configuration ì¦ `jobConf`ë¥¼ 
driver.Train ì ëê²¨ì¤ëë¤.
 
-We will provide helper functions to make the configuration easier in the
-future, like [keras](https://github.com/fchollet/keras).
+<!--We will provide helper functions to make the configuration easier in the
+future, like [keras](https://github.com/fchollet/keras).-->
 
-Users need to compile and link their code (e.g., layer implementations and the 
main
-file) with SINGA library (*.libs/libsinga.so*) to generate an
-executable file, e.g., with name *mysinga*.  To launch the program, users just 
pass the
-path of the *mysinga* and base job configuration to *./bin/singa-run.sh*.
+ì ì ì½ëë¥¼ compile íê³  SINGA library (*.libs/libsinga.so*) ì 
ë§í¬ìì¼ ì¤ííì¼, e.g., *mysinga*, ì ìì±í©ëë¤. 
íë¡ê·¸ë¨ì ë¤ìê³¼ ê°ì´ ì¤íí©ëë¤.
 
     ./bin/singa-run.sh -conf <path to job conf> -exec <path to mysinga> [other 
arguments]
 

Modified: incubator/singa/site/trunk/content/markdown/docs/python.md
URL: 
http://svn.apache.org/viewvc/incubator/singa/site/trunk/content/markdown/docs/python.md?rev=1740048&r1=1740047&r2=1740048&view=diff
==============================================================================
--- incubator/singa/site/trunk/content/markdown/docs/python.md (original)
+++ incubator/singa/site/trunk/content/markdown/docs/python.md Wed Apr 20 
05:09:06 2016
@@ -5,9 +5,9 @@
 Python binding provides APIs for configuring a training job following
 [keras](http://keras.io/), including the configuration of neural net, training
 algorithm, etc.  It replaces the configuration file (e.g., *job.conf*) in
-protobuf format, which is typically long and error-prone to prepare. In later
-version, we will add python functions to interact with the layer and neural net
-objects, which would enable users to train and debug their models
+protobuf format, which is typically long and error-prone to prepare. We will 
add
+python functions to interact with the layer and neural net
+objects (see [here](python_interactive_training.html)), which would enable 
users to train and debug their models
 interactively.
 
 Here is the layout of python related code,
@@ -66,11 +66,11 @@ X_train, X_test, workspace = mnist.load_
 
 m = Sequential('mlp', sys.argv)
 
-m.add(Dense(2500, init='uniform', activation='tanh'))
-m.add(Dense(2000, init='uniform', activation='tanh'))
-m.add(Dense(1500, init='uniform', activation='tanh'))
-m.add(Dense(1000, init='uniform', activation='tanh'))
-m.add(Dense(500,  init='uniform', activation='tanh'))
+m.add(Dense(2500, init='uniform', activation='stanh'))
+m.add(Dense(2000, init='uniform', activation='stanh'))
+m.add(Dense(1500, init='uniform', activation='stanh'))
+m.add(Dense(1000, init='uniform', activation='stanh'))
+m.add(Dense(500,  init='uniform', activation='stanh'))
 m.add(Dense(10, init='uniform', activation='softmax'))
 
 sgd = SGD(lr=0.001, lr_type='step')

Added: 
incubator/singa/site/trunk/content/markdown/docs/python_interactive_training.md
URL: 
http://svn.apache.org/viewvc/incubator/singa/site/trunk/content/markdown/docs/python_interactive_training.md?rev=1740048&view=auto
==============================================================================
--- 
incubator/singa/site/trunk/content/markdown/docs/python_interactive_training.md 
(added)
+++ 
incubator/singa/site/trunk/content/markdown/docs/python_interactive_training.md 
Wed Apr 20 05:09:06 2016
@@ -0,0 +1,186 @@
+# Interactive Training using Python
+
+---
+
+`Layer` class ([layer.py](layer.py)) has the following methods for an 
interactive training.
+For the basic usage of Python binding features, please refer to 
[python.md](python.md).
+
+**ComputeFeature(self, \*srclys)**
+
+* This method creates and sets up singa::Layer and maintains its source 
layers, then call singa::Layer::ComputeFeature(...) for data transformation.
+
+       * `*srclys`: (an arbtrary number of) source layers
+
+**ComputeGradient(self)**
+
+* This method creates calls singa::Layer::ComputeGradient(...) for gradient 
computation.
+
+**GetParams(self)**
+
+* This method calls singa::Layer::GetParam() to retrieve parameter values of 
the layer. Currently, it returns weight and bias. Each parameter is a 2D numpy 
array.
+
+**SetParams(self, \*params)**
+
+* This method sets parameter values of the layer.
+       * `*params`: (an arbitrary number of) parameters, each of which is a 2D 
numpy array. Typically, it sets weight and bias, 2D numpy array.
+
+* * *
+
+`Dummy` class is a subclass of `Layer`, which is provided to fetch input data 
and/or label information.
+Specifically, it creates singa::DummyLayer.
+
+**Feed(self, shape, data, aux_data)**
+
+* This method sets input data and/or auxiary data such as labels.
+
+       * `shape`: the shape (width and height) of dataset
+       * `data`: input dataset
+       * `aux_data`: auxiary dataset (e.g., labels)
+
+In addition, `Dummy` class has two subclasses named `ImageInput` and 
`LabelInput`.
+
+* `ImageInput` class will take three arguments as follows.
+
+       **\_\_init__(self, height=None, width=None, nb_channel=1)**
+
+* Both `ImageInput` and `LabelInput` classes have their own Feed method to 
call Feed of Dummy class.
+
+       **Feed(self, data)**
+
+
+<!--
+
+Users can save or load model parameter (e.g., weight and bias) at anytime 
during training.
+The following methods are provided in `model.py`.
+
+**save_model_parameter(step, fout, neuralnet)**
+
+* This method saves model parameters into the specified checkpoint (fout).
+
+       * `step`: the step id of training
+       * `fout`: the name of checkpoint (output filename)
+       * `neuralnet`: neural network model, i.e., a list of layers
+
+**load_model_parameter(fin, neuralnet, batchsize=1, data_shape=None)**
+
+* This method loads model parameters from the specified checkpoint (fin).
+
+       * `fin`: the name of checkpoint (input filename)
+       * `neuralnet`: neural network model, i.e., a list of layers
+       * `batchsize`:
+       * `data_shape`:
+-->
+
+* * *
+
+## Example scripts for the interactive training
+
+Two example scripts are provided at [`train_mnist.py`]() and 
[`train_cifar10.py`](), one is training MLP model for MNIST dataset, and 
another is training CNN model for CIFAR10 dataset.
+
+* Assume that `nn` is a neural network model, i.e., a list of layers. 
Currently, this examples considers sequential models. Example MLP and CNN are 
shown below.
+
+* `load_dataset()` method loads input data and corresponding labels, each of 
which is a 2D numpy array.
+For example, loading MNIST dataset returns x: [60000 x 784] and y: [60000 x 
1]. Loading CIFAR10 dataset, x: [10000 x 3072] and y: [10000 x 1].
+
+* `sgd` is an Updater instance. Please see [`python.md`](python.md) and 
[`model.py`]() for more details.
+
+#### Basic steps for the interactive training
+
+* Step 1: Prepare batchsized data and corresponding label information, and 
then input the data using `Feed()` method.
+
+* Step 2: (a) Transform data according to neuralnet (nn) structure using 
`ComputeFeature()`. Note that this example considers a sequential model, so it 
uses a simple loop. (b) Users need to provide `label` information for loss 
layer to compute loss function. (c) Users can print out the training 
performance, e.g., loss and accuracy.
+
+* Step 3: Compute gradient in a reverse order of neuralnet (nn) structure 
using `ComputeGradient()`.
+
+* Step 4: Update parameters, e.g., weight and bias, of layers using `Update()` 
of the updater.
+
+Here is an example script for the interactive training.
+```
+bsize = 64      # batchsize
+disp_freq = 10  # step to show the training accuracy
+
+x, y = load_dataset()
+
+for i in range(x.shape[0] / bsize):
+
+       # (Step1) Input data containing "bsize" samples
+       xb, yb = x[i*bsize:(i+1)*bsize, :], y[i*bsize:(i+1)*bsize, :]
+       nn[0].Feed(xb)
+       label.Feed(yb)
+
+       # (Step2-a) Transform data according to the neuralnet (nn) structure
+       for h in range(1, len(nn)):
+               nn[h].ComputeFeature(nn[h-1])
+
+       # (Step2-b) Provide label to compute loss function
+       loss.ComputeFeature(nn[-1], label)
+
+       # (Step2-c) Print out performance, e.g., loss and accuracy
+       if (i+1) % disp_freq == 0:
+               print '  Step {:>3}: '.format(i+1),
+               loss.display()
+
+       # (Step3) Compute gradient in a reverse order
+       loss.ComputeGradient()
+       for h in range(len(nn)-1, 0, -1):
+               nn[h].ComputeGradient()
+               # (Step 4) Update parameter
+               sgd.Update(i+1, nn[h])
+```        
+
+<a id="model"></a>
+### <a href="#model">Example MLP</a>  
+
+Here is an example MLP model with 5 fully-connected hidden layers.
+Please refer to [`python.md`](python.md) and [`layer.py`]() for more details 
about layer definition. `SGD()` is an updater defined in [`model.py`]().
+
+```
+input = ImageInput(28, 28) # image width and height
+label = LabelInput()
+
+nn = []
+nn.append(input)
+nn.append(Dense(2500, init='uniform'))
+nn.append(Activation('stanh'))
+nn.append(Dense(2000, init='uniform'))
+nn.append(Activation('stanh'))
+nn.append(Dense(1500, init='uniform'))
+nn.append(Activation('stanh'))
+nn.append(Dense(1000, init='uniform'))
+nn.append(Activation('stanh'))
+nn.append(Dense(500, init='uniform'))
+nn.append(Activation('stanh'))
+nn.append(Dense(10, init='uniform'))
+loss = Loss('softmaxloss')
+
+sgd = SGD(lr=0.001, lr_type='step')
+
+```
+
+### <a href="#model2">Example CNN</a>  
+
+Here is an example MLP model with 3 convolution and pooling layers.
+Please refer to [`python.md`]() and [`layer.py`]() for more details about 
layer definition. `SGD()` is an updater defined in [`model.py`]().
+
+```
+input = ImageInput(32, 32, 3) # image width, height, channel
+label = LabelInput()
+
+nn = []
+nn.append(input)
+nn.append(Convolution2D(32, 5, 1, 2, w_std=0.0001, b_lr=2))
+nn.append(MaxPooling2D(pool_size=(3,3), stride=2))
+nn.append(Activation('relu'))
+nn.append(LRN2D(3, alpha=0.00005, beta=0.75))
+nn.append(Convolution2D(32, 5, 1, 2, b_lr=2))
+nn.append(Activation('relu'))
+nn.append(AvgPooling2D(pool_size=(3,3), stride=2))
+nn.append(LRN2D(3, alpha=0.00005, beta=0.75))
+nn.append(Convolution2D(64, 5, 1, 2))
+nn.append(Activation('relu'))
+nn.append(AvgPooling2D(pool_size=(3,3), stride=2))
+nn.append(Dense(10, w_wd=250, b_lr=2, b_wd=0))
+loss = Loss('softmaxloss')
+
+sgd = SGD(decay=0.004, momentum=0.9, lr_type='manual', step=(0,60000,65000), 
step_lr=(0.001,0.0001,0.00001))
+```

Modified: incubator/singa/site/trunk/content/markdown/docs/updater.md
URL: 
http://svn.apache.org/viewvc/incubator/singa/site/trunk/content/markdown/docs/updater.md?rev=1740048&r1=1740047&r2=1740048&view=diff
==============================================================================
--- incubator/singa/site/trunk/content/markdown/docs/updater.md (original)
+++ incubator/singa/site/trunk/content/markdown/docs/updater.md Wed Apr 20 
05:09:06 2016
@@ -69,6 +69,48 @@ Its type is `kRMSProp`.
       }
     }
 
+#### AdaDeltaUpdater
+
+It inherits the base `Updater` to implements the
+[AdaDelta](http://arxiv.org/abs/1212.5701) updating algorithm.
+Its type is `kAdaDelta`.
+
+    updater {
+      type: kAdaDelta
+      adadelta_conf {
+       rho: float # [0,1]
+      }
+    }
+
+#### Adam
+
+It inherits the base `Updater` to implements the
+[Adam](http://arxiv.org/pdf/1412.6980.pdf) updating algorithm.
+Its type is `kAdam`.
+`beta1` and `beta2` is floats, 0 < `beta` < 1, generally close to 1.
+
+    updater {
+      type: kAdam
+      adam_conf {
+       beta1: float # [0,1]
+       beta2: float # [0,1]
+      }
+    }
+
+#### AdaMax
+
+It inherits the base `Updater` to implements the
+[AdaMax](http://arxiv.org/pdf/1412.6980.pdf) updating algorithm.
+Its type is `kAdamMax`.
+`beta1` and `beta2` is floats, 0 < `beta` < 1, generally close to 1.
+
+    updater {
+      type: kAdamMax
+      adammax_conf {
+       beta1: float # [0,1]
+       beta2: float # [0,1]
+      }
+    }
 
 ### Configuration of learning rate
 

Modified: incubator/singa/site/trunk/content/markdown/downloads.md
URL: 
http://svn.apache.org/viewvc/incubator/singa/site/trunk/content/markdown/downloads.md?rev=1740048&r1=1740047&r2=1740048&view=diff
==============================================================================
--- incubator/singa/site/trunk/content/markdown/downloads.md (original)
+++ incubator/singa/site/trunk/content/markdown/downloads.md Wed Apr 20 
05:09:06 2016
@@ -3,17 +3,34 @@
 
 * Latest code: please clone the latest code from 
[Github](https://github.com/apache/incubator-singa)
 
+* v0.3.0 (20 April 2016):
+    * [Apache SINGA 
0.3.0](http://www.apache.org/dyn/closer.cgi/incubator/singa/0.3.0/apache-singa-incubating-0.3.0.tar.gz)
+      
[\[MD5\]](https://dist.apache.org/repos/dist/release/incubator/singa/0.3.0/apache-singa-incubating-0.3.0.tar.gz.md5)
+      
[\[KEYS\]](https://dist.apache.org/repos/dist/release/incubator/singa/0.3.0/KEYS)
+    * [Release Notes 0.3.0](releases/RELEASE_NOTES_0.3.0.html)
+    * New features and major updates,
+        * [Training on GPU cluster](v0.3.0/gpu.html) enables training of deep 
learning models over a GPU cluster.
+        * [Python wrapper improvement](v0.3.0/python.html) makes it easy to 
configure the job, including neural net and SGD algorithm.
+        * [New SGD updaters](v0.3.0/updater.html) are added, including Adam, 
AdaDelta and AdaMax.
+        * [Installation](v0.3.0/installation.html) has fewer dependent 
libraries for single node training.
+        * Heterogeneous training with CPU and GPU.
+        * Support cuDNN V4.
+        * Data prefetching.
+        * Fix some bugs.
+
+
+
 * v0.2.0 (14 January 2016):
     * [Apache SINGA 
0.2.0](http://www.apache.org/dyn/closer.cgi/incubator/singa/0.2.0/apache-singa-incubating-0.2.0.tar.gz)
-      
[\[MD5\]](https://dist.apache.org/repos/dist/release/incubator/singa/0.2.0/apache-singa-incubating-0.2.0.tar.gz.md5)
-      
[\[KEYS\]](https://dist.apache.org/repos/dist/release/incubator/singa/0.2.0/KEYS)
+      
[\[MD5\]](https://archive.apache.org/dist/incubator/singa/0.2.0/apache-singa-incubating-0.2.0.tar.gz.md5)
+      [\[KEYS\]](https://archive.apache.org/dist/incubator/singa/0.2.0/KEYS)
     * [Release Notes 0.2.0](releases/RELEASE_NOTES_0.2.0.html)
     * New features and major updates,
-        * [Training on GPU](docs/gpu.html) enables training of complex models 
on a single node with multiple GPU cards.
-        * [Hybrid neural net partitioning](docs/hybrid.html) supports data and 
model parallelism at the same time.
-        * [Python wrapper](docs/python.html) makes it easy to configure the 
job, including neural net and SGD algorithm.
-        * [RNN model and BPTT algorithm](docs/general-rnn.html) are 
implemented to support applications based on RNN models, e.g., GRU.
-        * [Cloud software integration](docs/distributed-training.html) 
includes Mesos, Docker and HDFS.
+        * [Training on GPU](v0.2.0/gpu.html) enables training of complex 
models on a single node with multiple GPU cards.
+        * [Hybrid neural net partitioning](v0.2.0/hybrid.html) supports data 
and model parallelism at the same time.
+        * [Python wrapper](v0.2.0/python.html) makes it easy to configure the 
job, including neural net and SGD algorithm.
+        * [RNN model and BPTT algorithm](v0.2.0/general-rnn.html) are 
implemented to support applications based on RNN models, e.g., GRU.
+        * [Cloud software integration](v0.2.0/distributed-training.html) 
includes Mesos, Docker and HDFS.
         * Visualization of neural net structure and layer information, which 
is helpful for debugging.
         * Linear algebra functions and random functions against Blobs and raw 
data pointers.
         * New layers, including SoftmaxLayer, ArgSortLayer, DummyLayer, RNN 
layers and cuDNN layers.
@@ -24,9 +41,8 @@
 
 * v0.1.0 (8 October 2015):
     * [Apache SINGA 
0.1.0](http://www.apache.org/dyn/closer.cgi/incubator/singa/apache-singa-incubating-0.1.0.tar.gz)
-    * 
[\[PGP\]](https://dist.apache.org/repos/dist/release/incubator/singa/apache-singa-incubating-0.1.0.tar.gz.asc)
-      
[\[MD5\]](https://dist.apache.org/repos/dist/release/incubator/singa/apache-singa-incubating-0.1.0.tar.gz.md5)
-      
[\[KEYS\]](https://dist.apache.org/repos/dist/release/incubator/singa/KEYS)
+      
[\[MD5\]](https://archive.apache.org/dist/incubator/singa/apache-singa-incubating-0.1.0.tar.gz.md5)
+      [\[KEYS\]](https://archive.apache.org/dist/incubator/singa/KEYS)
     * [Amazon EC2 
image](https://console.aws.amazon.com/ec2/v2/home?region=ap-southeast-1#LaunchInstanceWizard:ami=ami-b41001e6)
     * [Release Notes 0.1.0](releases/RELEASE_NOTES_0.1.0.html)
     * Major features include,

Modified: incubator/singa/site/trunk/content/markdown/index.md
URL: 
http://svn.apache.org/viewvc/incubator/singa/site/trunk/content/markdown/index.md?rev=1740048&r1=1740047&r2=1740048&view=diff
==============================================================================
--- incubator/singa/site/trunk/content/markdown/index.md (original)
+++ incubator/singa/site/trunk/content/markdown/index.md Wed Apr 20 05:09:06 
2016
@@ -2,6 +2,8 @@
   <title>A Distributed Deep Learning Platform</title>
 </head>
 ### Recent News
+* The **third release** is now available, 20 April, 2016. [Download SINGA 
v0.3.0](downloads.html).
+
 * The **second release** is now available, 14 Jan, 2016. [Download SINGA 
v0.2.0](downloads.html).
 
 * SINGA will be presented at 
[Strata+Hadoop](http://strataconf.com/big-data-conference-sg-2015/public/schedule/detail/45123)
 on 2 Dec, 2015

Added: incubator/singa/site/trunk/content/markdown/v0.3.0/architecture.md
URL: 
http://svn.apache.org/viewvc/incubator/singa/site/trunk/content/markdown/v0.3.0/architecture.md?rev=1740048&view=auto
==============================================================================
--- incubator/singa/site/trunk/content/markdown/v0.3.0/architecture.md (added)
+++ incubator/singa/site/trunk/content/markdown/v0.3.0/architecture.md Wed Apr 
20 05:09:06 2016
@@ -0,0 +1,54 @@
+# SINGA Architecture
+
+---
+
+## Logical Architecture
+
+<img src="../images/logical.png" style="width: 550px"/>
+<p><strong> Fig.1 - Logical system architecture</strong></p>
+
+SINGA has flexible architecture to support different distributed
+[training frameworks](frameworks.html) (both synchronous and asynchronous).
+The logical system architecture is shown in Fig.1.
+The architecture consists of multiple server groups and worker groups:
+
+* **Server group**
+  A server group maintains a complete replica of the model parameters,
+  and is responsible for handling get/update requests from worker groups.
+  Neighboring server groups synchronize their parameters periodically.
+  Typically, a server group contains a number of servers,
+  and each server manages a partition of model parameters.
+* **Worker group**
+  Each worker group communicates with only one server group.
+  A worker group trains a complete model replica
+  against a partition of the training dataset,
+  and is responsible for computing parameter gradients.
+  All worker groups run and communicate with the corresponding
+  server groups asynchronously.
+  However, inside each worker group,
+  the workers synchronously compute parameter updates for the model replica.
+
+There are different strategies to distribute the training workload among 
workers
+within a group:
+
+  * **Model parallelism**. Each worker computes a subset of parameters
+  against all data partitioned to the group.
+  * **Data parallelism**. Each worker computes all parameters
+  against a subset of data.
+  * [**Hybrid parallelism**](hybrid.html). SINGA also supports hybrid 
parallelism.
+
+
+## Implementation
+In SINGA, servers and workers are execution units running in separate threads.
+They communicate through [messages](communication.html).
+Every process runs the main thread as a stub that aggregates local messages
+and forwards them to corresponding (remote) receivers.
+
+Each server group and worker group have a *ParamShard*
+object representing a complete model replica. If workers and servers
+resident in the same process, their *ParamShard* (partitions) can
+be configured to share the same memory space. In this case, the
+messages transferred between different execution units just contain
+pointers to the data, which reduces the communication cost.
+Unlike in inter-process cases,
+the messages have to include the parameter values.

Added: incubator/singa/site/trunk/content/markdown/v0.3.0/checkpoint.md
URL: 
http://svn.apache.org/viewvc/incubator/singa/site/trunk/content/markdown/v0.3.0/checkpoint.md?rev=1740048&view=auto
==============================================================================
--- incubator/singa/site/trunk/content/markdown/v0.3.0/checkpoint.md (added)
+++ incubator/singa/site/trunk/content/markdown/v0.3.0/checkpoint.md Wed Apr 20 
05:09:06 2016
@@ -0,0 +1,70 @@
+# CheckPoint
+
+---
+
+SINGA checkpoints model parameters onto disk periodically according to user
+configured frequency. By checkpointing model parameters, we can
+
+  1. resume the training from the last checkpointing. For example, if
+    the program crashes before finishing all training steps, we can continue
+    the training using checkpoint files.
+
+  2. use them to initialize a similar model. For example, the
+    parameters from training a RBM model can be used to initialize
+    a [deep auto-encoder](rbm.html) model.
+
+## Configuration
+
+Checkpointing is controlled by two configuration fields:
+
+* `checkpoint_after`, start checkpointing after this number of training steps,
+* `checkpoint_freq`, frequency of doing checkpointing.
+
+For example,
+
+    # job.conf
+    checkpoint_after: 100
+    checkpoint_frequency: 300
+    ...
+
+Checkpointing files are located at 
*WORKSPACE/checkpoint/stepSTEP-workerWORKERID*.
+*WORKSPACE* is configured in
+
+    cluster {
+      workspace:
+    }
+
+For the above configuration, after training for 700 steps, there would be
+two checkpointing files,
+
+    step400-worker0
+    step700-worker0
+
+## Application - resuming training
+
+We can resume the training from the last checkpoint (i.e., step 700) by,
+
+    ./bin/singa-run.sh -conf JOB_CONF -resume
+
+There is no change to the job configuration.
+
+## Application - model initialization
+
+We can also use the checkpointing file from step 400 to initialize
+a new model by configuring the new job as,
+
+    # job.conf
+    checkpoint : "WORKSPACE/checkpoint/step400-worker0"
+    ...
+
+If there are multiple checkpointing files for the same snapshot due to model
+partitioning, all the checkpointing files should be added,
+
+    # job.conf
+    checkpoint : "WORKSPACE/checkpoint/step400-worker0"
+    checkpoint : "WORKSPACE/checkpoint/step400-worker1"
+    ...
+
+The training command is the same as starting a new job,
+
+    ./bin/singa-run.sh -conf JOB_CONF

Added: incubator/singa/site/trunk/content/markdown/v0.3.0/cnn.md
URL: 
http://svn.apache.org/viewvc/incubator/singa/site/trunk/content/markdown/v0.3.0/cnn.md?rev=1740048&view=auto
==============================================================================
--- incubator/singa/site/trunk/content/markdown/v0.3.0/cnn.md (added)
+++ incubator/singa/site/trunk/content/markdown/v0.3.0/cnn.md Wed Apr 20 
05:09:06 2016
@@ -0,0 +1,239 @@
+# CNN Example
+
+---
+
+Convolutional neural network (CNN) is a type of feed-forward artificial neural
+network widely used for image and video classification. In this example, we 
will
+use a deep CNN model to do image classification for the
+[CIFAR10 dataset](http://www.cs.toronto.edu/~kriz/cifar.html).
+
+
+## Running instructions
+
+Please refer to the [installation](installation.html) page for
+instructions on building SINGA, and the [quick start](quick-start.html)
+for instructions on starting zookeeper.
+
+We have provided scripts for preparing the training and test dataset in 
*examples/cifar10/*.
+
+    # in examples/cifar10
+    $ cp Makefile.example Makefile
+    $ make download
+    $ make create
+
+
+### Training on CPU
+
+We can start the training by
+
+    ./bin/singa-run.sh -conf examples/cifar10/job.conf
+
+You should see output like
+
+    Record job information to /tmp/singa-log/job-info/job-2-20150817-055601
+    Executing : ./singa -conf /xxx/incubator-singa/examples/cifar10/job.conf 
-singa_conf /xxx/incubator-singa/conf/singa.conf -singa_job 2
+    E0817 06:56:18.868259 33849 cluster.cc:51] proc #0 -> 192.168.5.128:49152 
(pid = 33849)
+    E0817 06:56:18.928452 33871 server.cc:36] Server (group = 0, id = 0) start
+    E0817 06:56:18.928469 33872 worker.cc:134] Worker (group = 0, id = 0) start
+    E0817 06:57:13.657302 33849 trainer.cc:373] Test step-0, loss : 2.302588, 
accuracy : 0.077900
+    E0817 06:57:17.626708 33849 trainer.cc:373] Train step-0, loss : 2.302578, 
accuracy : 0.062500
+    E0817 06:57:24.142645 33849 trainer.cc:373] Train step-30, loss : 
2.302404, accuracy : 0.131250
+    E0817 06:57:30.813354 33849 trainer.cc:373] Train step-60, loss : 
2.302248, accuracy : 0.156250
+    E0817 06:57:37.556655 33849 trainer.cc:373] Train step-90, loss : 
2.301849, accuracy : 0.175000
+    E0817 06:57:44.971276 33849 trainer.cc:373] Train step-120, loss : 
2.301077, accuracy : 0.137500
+    E0817 06:57:51.801949 33849 trainer.cc:373] Train step-150, loss : 
2.300410, accuracy : 0.135417
+    E0817 06:57:58.682281 33849 trainer.cc:373] Train step-180, loss : 
2.300067, accuracy : 0.127083
+    E0817 06:58:05.578366 33849 trainer.cc:373] Train step-210, loss : 
2.300143, accuracy : 0.154167
+    E0817 06:58:12.518497 33849 trainer.cc:373] Train step-240, loss : 
2.295912, accuracy : 0.185417
+
+After training some steps (depends on the setting) or the job is
+finished, SINGA will [checkpoint](checkpoint.html) the model parameters.
+
+### Training on GPU
+
+Since version 0.2, we can train CNN models on GPU using cuDNN. Please refer to
+the [GPU page](gpu.html) for details on compiling SINGA with GPU and cuDNN.
+The configuration file is similar to that for CPU training, except that the
+cuDNN layers are used and the GPU device is configured.
+
+    ./bin/singa-run.sh -conf examples/cifar10/cudnn.conf
+
+### Training using Python script
+
+The python helpers coming with SINGA 0.2 make it easy to configure a training
+job. For example the *job.conf* is replaced with a simple python script
+*mnist_mlp.py* which has about 30 lines of code following the [Keras 
API](http://keras.io/).
+
+      # on CPU
+    ./bin/singa-run.sh -exec tool/python/examples/cifar10_cnn.py
+      # on GPU
+    ./bin/singa-run.sh -exec tool/python/examples/cifar10_cnn_cudnn.py
+
+## Details
+
+To train a model in SINGA, you need to prepare the datasets,
+and a job configuration which specifies the neural net structure, training
+algorithm (BP or CD), SGD update algorithm (e.g. Adagrad),
+number of training/test steps, etc.
+
+### Data preparation
+
+Before using SINGA, you need to write a program to convert the dataset
+into a format that SINGA can read. Please refer to the
+[Data Preparation](data.html#example---cifar-dataset) to get details about
+preparing this CIFAR10 dataset.
+
+### Neural net
+
+Figure 1 shows the net structure of the CNN model we used in this example, 
which is
+set following 
[Alex](https://code.google.com/p/cuda-convnet/source/browse/trunk/example-layers/layers-18pct.cfg.)
+The dashed circle represents one feature transformation stage, which generally
+has four layers as shown in the figure. Sometimes the rectifier layer and 
normalization layer
+are omitted or swapped in one stage. For this example, there are 3 such stages.
+
+Next we follow the guide in [neural net page](neural-net.html)
+and [layer page](layer.html) to write the neural net configuration.
+
+<div style = "text-align: center">
+<img src = "../images/example-cnn.png" style = "width: 200px"> <br/>
+<strong>Figure 1 - Net structure of the CNN example.</strong></img>
+</div>
+
+* We configure an input layer to read the training/testing records from a disk 
file.
+
+        layer{
+          name: "data"
+          type: kRecordInput
+          store_conf {
+            backend: "kvfile"
+            path: "examples/cifar10/train_data.bin"
+            mean_file: "examples/cifar10/image_mean.bin"
+            batchsize: 64
+            random_skip: 5000
+            shape: 3
+            shape: 32
+            shape: 32
+           }
+           exclude: kTest  # exclude this layer for the testing net
+        }
+        layer{
+          name: "data"
+          type: kRecordInput
+          store_conf {
+            backend: "kvfile"
+            path: "examples/cifar10/test_data.bin"
+            mean_file: "examples/cifar10/image_mean.bin"
+            batchsize: 100
+            shape: 3
+            shape: 32
+            shape: 32
+           }
+         exclude: kTrain # exclude this layer for the training net
+        }
+
+
+* We configure layers for the feature transformation as follows
+(all layers are built-in layers in SINGA; hyper-parameters of these layers are 
set according to
+[Alex's 
setting](https://code.google.com/p/cuda-convnet/source/browse/trunk/example-layers/layers-18pct.cfg)).
+
+        layer {
+          name: "conv1"
+          type: kConvolution
+          srclayers: "data"
+          convolution_conf {... }
+          ...
+        }
+        layer {
+          name: "pool1"
+          type: kPooling
+          srclayers: "conv1"
+          pooling_conf {... }
+        }
+        layer {
+          name: "relu1"
+          type: kReLU
+          srclayers:"pool1"
+        }
+        layer {
+          name: "norm1"
+          type: kLRN
+          lrn_conf {... }
+          srclayers:"relu1"
+        }
+
+  The configurations for another 2 stages are omitted here.
+
+* There is an [inner product layer](layer.html#innerproductlayer)
+after the 3 transformation stages, which is
+configured with 10 output units, i.e., the number of total labels. The weight
+matrix Param is configured with a large weight decay scale to reduce the 
over-fitting.
+
+        layer {
+          name: "ip1"
+          type: kInnerProduct
+          srclayers:"pool3"
+          innerproduct_conf {
+            num_output: 10
+          }
+          param {
+            name: "w4"
+            wd_scale:250
+            ...
+          }
+          param {
+            name: "b4"
+            ...
+          }
+        }
+
+* The last layer is a [Softmax loss layer](layer.html#softmaxloss)
+
+        layer{
+          name: "loss"
+          type: kSoftmaxLoss
+          softmaxloss_conf{ topk:1 }
+          srclayers:"ip1"
+          srclayers: "data"
+        }
+
+### Updater
+
+The [normal SGD updater](updater.html#updater) is selected.
+The learning rate is changed like going down stairs, and is configured using 
the
+[kFixedStep](updater.html#kfixedstep) type.
+
+        updater{
+          type: kSGD
+          weight_decay:0.004
+          learning_rate {
+            type: kFixedStep
+            fixedstep_conf:{
+              step:0             # lr for step 0-60000 is 0.001
+              step:60000         # lr for step 60000-65000 is 0.0001
+              step:65000         # lr for step 650000- is 0.00001
+              step_lr:0.001
+              step_lr:0.0001
+              step_lr:0.00001
+            }
+          }
+        }
+
+### TrainOneBatch algorithm
+
+The CNN model is a feed forward model, thus should be configured to use the
+[Back-propagation algorithm](train-one-batch.html#back-propagation).
+
+    train_one_batch {
+      alg: kBP
+    }
+
+### Cluster setting
+
+The following configuration set a single worker and server for training.
+[Training frameworks](frameworks.html) page introduces configurations of a 
couple of distributed
+training frameworks.
+
+    cluster {
+      nworker_groups: 1
+      nserver_groups: 1
+    }

Added: incubator/singa/site/trunk/content/markdown/v0.3.0/code-structure.md
URL: 
http://svn.apache.org/viewvc/incubator/singa/site/trunk/content/markdown/v0.3.0/code-structure.md?rev=1740048&view=auto
==============================================================================
--- incubator/singa/site/trunk/content/markdown/v0.3.0/code-structure.md (added)
+++ incubator/singa/site/trunk/content/markdown/v0.3.0/code-structure.md Wed 
Apr 20 05:09:06 2016
@@ -0,0 +1,76 @@
+# Code Structure
+
+---
+
+<!--
+
+### Worker Side
+
+#### Main Classes
+
+<img src="../images/code-structure/main.jpg" style="width: 550px"/>
+
+* **Worker**: start the solver to conduct training or resume from previous 
training snapshots.
+* **Solver**: construct the neural network and run training algorithms over 
it. Validation and testing is also done by the solver along the training.
+* **TableDelegate**: delegate for the parameter table physically stored in 
parameter servers.
+    it runs a thread to communicate with table servers for parameter 
transferring.
+* **Net**: the neural network consists of multiple layers constructed from 
input configuration file.
+* **Layer**: the core abstraction, read data (neurons) from connecting layers, 
and compute the data
+    of itself according to layer specific ComputeFeature functions. Data from 
the bottom layer is forwarded
+    layer by layer to the top.
+
+#### Data types
+
+<img src="../images/code-structure/layer.jpg" style="width: 700px"/>
+
+* **ComputeFeature**: read data (neurons) from in-coming layers, and compute 
the data
+    of itself according to layer type. This function can be overrided to 
implement different
+    types layers.
+* **ComputeGradient**: read gradients (and data) from in-coming layers and 
compute
+    gradients of parameters and data w.r.t the learning objective (loss).
+
+We adpat the implementation for **PoolingLayer**, **Im2colLayer** and 
**LRNLayer** from [Caffe](http://caffe.berkeleyvision.org/).
+
+
+<img src="../images/code-structure/darray.jpg" style="width: 400px"/>
+
+* **DArray**: provide the abstraction of distributed array on multiple nodes,
+    supporting array/matrix operations and element-wise operations. Users can 
use it as a local structure.
+* **LArray**: the local part for the DArray. Each LArray is treated as an
+    independent array, and support all array-related operations.
+* **MemSpace**: manage the memory used by DArray. Distributed memory are 
allocated
+    and managed by armci. Multiple DArray can share a same MemSpace, the memory
+    will be released when no DArray uses it anymore.
+* **Partition**: maintain both global shape and local partition information.
+    used when two DArray are going to interact.
+* **Shape**: basic class for representing the scope of a DArray/LArray
+* **Range**: basic class for representing the scope of a Partition
+
+### Parameter Server
+
+#### Main classes
+
+<img src="../images/code-structure/uml.jpg" style="width: 750px"/>
+
+* **NetworkService**: provide access to the network (sending and receiving 
messages). It maintains a queue for received messages, implemented by 
NetworkQueue.
+* **RequestDispatcher**: pick up next message (request) from the queue, and 
invoked a method (callback) to process them.
+* **TableServer**: provide access to the data table (parameters). Register 
callbacks for different types of requests to RequestDispatcher.
+* **GlobalTable**: implement the table. Data is partitioned into multiple 
Shard objects per table. User-defined consistency model supported by extending 
TableServerHandler for each table.
+
+#### Data types
+
+<img src="../images/code-structure/type.jpg" style="width: 400px"/>
+
+Table related messages are either of type **RequestBase** which contains 
different types of request, or of type **TableData** containing a key-value 
tuple.
+
+#### Control flow and thread model
+
+<img src="../images/code-structure/threads.jpg" alt="uml" style="width: 
1000px"/>
+
+The figure above shows how a GET request sent from a worker is processed by the
+table server. The control flow for other types of requests is similar. At
+the server side, there are at least 3 threads running at any time: two by
+NetworkService for sending and receiving message, and at least one by the
+RequestDispatcher for dispatching requests.
+
+-->

Added: incubator/singa/site/trunk/content/markdown/v0.3.0/communication.md
URL: 
http://svn.apache.org/viewvc/incubator/singa/site/trunk/content/markdown/v0.3.0/communication.md?rev=1740048&view=auto
==============================================================================
--- incubator/singa/site/trunk/content/markdown/v0.3.0/communication.md (added)
+++ incubator/singa/site/trunk/content/markdown/v0.3.0/communication.md Wed Apr 
20 05:09:06 2016
@@ -0,0 +1,453 @@
+# Communication
+
+---
+
+Different messaging libraries has different benefits and drawbacks. For 
instance,
+MPI provides fast message passing between GPUs (using GPUDirect), but does not
+support fault-tolerance well. On the contrary, systems using ZeroMQ can be
+fault-tolerant, but does not support GPUDirect. The AllReduce function
+of MPI is also missing in ZeroMQ which is efficient for data aggregation for
+distributed training. In Singa, we provide general messaging APIs for
+communication between threads within a process and across processes, and let
+users choose the underlying implementation (MPI or ZeroMQ) that meets their 
requirements.
+
+Singa's messaging library consists of two components, namely the message, and
+the socket to send and receive messages. **Socket** refers to a
+Singa defined data structure instead of the Linux Socket.
+We will introduce the two components in detail with the following figure as an
+example architecture.
+
+<img src="../images/arch/arch2.png" style="width: 550px"/>
+<img src="../images/arch/comm.png" style="width: 550px"/>
+<p><strong> Fig.1 - Example physical architecture and network 
connection</strong></p>
+
+Fig.1 shows an example physical architecture and its network connection.
+[Section-partition server side ParamShard](architecture.html}) has a detailed 
description of the
+architecture. Each process consists of one main thread running the stub and 
multiple
+background threads running the worker and server tasks. The stub of the main
+thread forwards messages among threads . The worker and
+server tasks are performed by the background threads.
+
+## Message
+
+<object type="image/svg+xml" style="width: 100px" data="../images/msg.svg" > 
Not
+supported </object>
+<p><strong> Fig.2 - Logical message format</strong></p>
+
+Fig.2 shows the logical message format which has two parts, the header and the
+content. The message header includes the sender's and receiver's IDs, each 
consisting of
+the group ID and the worker/server ID within the group. The stub forwards
+messages by looking up an address table based on the receiver's ID.
+There are two sets of messages according to the message type defined below.
+
+  * kGet/kPut/kRequest/kSync for messages about parameters
+
+  * kFeaBlob/kGradBlob for messages about transferring feature and gradient
+  blobs of one layer to its neighboring layer
+
+There is a target ID in the header. If the message body is parameters,
+the target ID is then the parameter ID. Otherwise the message is related to
+layer feature or gradient, and the target ID consists of the layer ID and the
+blob ID of that layer. The message content has multiple frames to store the
+parameter or feature data.
+
+The API for the base Msg is:
+
+    /**
+     * Msg used to transfer Param info (gradient or value), feature blob, etc
+     * between workers, stubs and servers.
+     *
+     * Each msg has a source addr and dest addr identified by a unique integer.
+     * It is also associated with a target field (value and version) for ease 
of
+     * getting some meta info (e.g., parameter id) from the msg.
+     *
+     * Other data is added into the message as frames.
+     */
+    class Msg {
+     public:
+      ~Msg();
+      Msg();
+      /**
+       * Construct the msg providing source and destination addr.
+       */
+      Msg(int src, int dst);
+      /**
+       * Copy constructor.
+       */
+      Msg(const Msg& msg);
+      /**
+       * Swap the src/dst addr
+       */
+      void SwapAddr();
+      /**
+       * Add a frame (a chunk of bytes) into the message
+       */
+      void AddFrame(const void* addr, int nBytes);
+      /**
+       * @return num of bytes of the current frame.
+       */
+      int FrameSize();
+      /**
+       * @return the pointer to the current frame data.
+       */
+      void* FrameData();
+      /**
+       * @return the data of the current frame as c string
+       */
+      char* FrameStr();
+      /**
+       * Move the cursor to the first frame.
+       */
+      void FirstFrame();
+      /**
+       * Move the cursor to the last frame.
+       */
+      void LastFrame();
+      /**
+       * Move the cursor to the next frame
+       * @return true if the next frame is not NULL; otherwise false
+       */
+      bool NextFrame();
+      /**
+       *  Add a 'format' frame to the msg (like CZMQ's zsock_send).
+       *
+       *  The format is a string that defines the type of each field.
+       *  The format can contain any of these characters, each corresponding to
+       *  one or two arguments:
+       *  i = int (signed)
+       *  1 = uint8_t
+       *  2 = uint16_t
+       *  4 = uint32_t
+       *  8 = uint64_t
+       *  p = void * (sends the pointer value, only meaningful over inproc)
+       *  s = char**
+       *
+       *  Returns size of the added content.
+       */
+      int AddFormatFrame(const char *format, ...);
+      /**
+       *  Parse the current frame added using AddFormatFrame(const char*, ...).
+       *
+       *  The format is a string that defines the type of each field.
+       *  The format can contain any of these characters, each corresponding to
+       *  one or two arguments:
+       *  i = int (signed)
+       *  1 = uint8_t
+       *  2 = uint16_t
+       *  4 = uint32_t
+       *  8 = uint64_t
+       *  p = void * (sends the pointer value, only meaningful over inproc)
+       *  s = char**
+       *
+       *  Returns size of the parsed content.
+       */
+      int ParseFormatFrame(const char* format, ...);
+
+    #ifdef USE_ZMQ
+      void ParseFromZmsg(zmsg_t* msg);
+      zmsg_t* DumpToZmsg();
+    #endif
+
+      /**
+       * @return msg size in terms of bytes, ignore meta info.
+       */
+      int size() const;
+      /**
+       * Set source addr.
+       * @param addr unique identify one worker/server/stub in the current job
+       */
+      void set_src(int addr) { src_ = addr; }
+      /**
+       * @return source addr.
+       */
+      int src() const { return src_; }
+      /**
+       * Set destination addr.
+       * @param addr unique identify one worker/server/stub in the current job
+       */
+      void set_dst(int addr) { dst_ = addr; }
+      /**
+       * @return dst addr.
+       */
+      int dst() const { return dst_; }
+      /**
+       * Set msg type, e.g., kPut, kGet, kUpdate, kRequest
+       */
+      void set_type(int type) { type_ = type; }
+      /**
+       * @return msg type.
+       */
+      int type() const { return type_; }
+      /**
+       * Set msg target.
+       *
+       * One msg has a target to identify some entity in worker/server/stub.
+       * The target is associated with a version, e.g., Param version.
+       */
+      void set_trgt(int val, int version) {
+        trgt_val_ = val;
+        trgt_version_ = version;
+      }
+      int trgt_val() const {
+        return trgt_val_;
+      }
+      int trgt_version() const {
+        return trgt_version_;
+      }
+
+    };
+
+In order for a Msg object to be routed, the source and dest address should be 
attached.
+This is achieved by calling the set_src and set_dst methods of the Msg object.
+The address parameter passed to these two methods can be manipulated via a set 
of
+helper functions, shown as below.
+
+    /**
+     * Wrapper to generate message address
+     * @param grp worker/server group id
+     * @param id_or_proc worker/server id or procs id
+     * @param type msg type
+     */
+    inline int Addr(int grp, int id_or_proc, int type) {
+      return (grp << 16) | (id_or_proc << 8) | type;
+    }
+
+    /**
+     * Parse group id from addr.
+     *
+     * @return group id
+     */
+    inline int AddrGrp(int addr) {
+      return addr >> 16;
+    }
+    /**
+     * Parse worker/server id from addr.
+     *
+     * @return id
+     */
+    inline int AddrID(int addr) {
+      static const int mask = (1 << 8) - 1;
+      return (addr >> 8) & mask;
+    }
+
+    /**
+     * Parse worker/server procs from addr.
+     *
+     * @return procs id
+     */
+    inline int AddrProc(int addr) {
+      return AddrID(addr);
+    }
+    /**
+     * Parse msg type from addr
+     * @return msg type
+     */
+    inline int AddrType(int addr) {
+      static const int mask = (1 << 8) -1;
+      return addr & mask;
+    }
+
+
+## Socket
+
+In SINGA, there are two types of sockets, the Dealer Socket and the Router
+Socket, whose names are adapted from ZeroMQ. All connections are of the same 
type, i.e.,
+Dealer<-->Router. The communication between dealers and routers are
+asynchronous. In other words, one Dealer
+socket can talk with multiple Router sockets, and one Router socket can talk
+with multiple Dealer sockets.
+
+### Base Socket
+
+The basic functions of a Singa Socket is to send and receive messages. The APIs
+are:
+
+    class SocketInterface {
+     public:
+      virtual ~SocketInterface() {}
+      /**
+        * Send a message to connected socket(s), non-blocking. The message
+        * will be deallocated after sending, thus should not be used after
+        * calling Send();
+        *
+        * @param msg The message to be sent
+        * @return 1 for success queuing the message for sending, 0 for failure
+        */
+      virtual int Send(Msg** msg) = 0;
+      /**
+        * Receive a message from any connected socket.
+        *
+        * @return a message pointer if success; nullptr if failure
+        */
+      virtual Msg* Receive() = 0;
+      /**
+       * @return Identifier of the implementation dependent socket. E.g., 
zsock_t*
+       * for ZeroMQ implementation and rank for MPI implementation.
+       */
+      virtual void* InternalID() const = 0;
+    };
+
+A poller class is provided to enable asynchronous communication between 
routers and dealers.
+One can register a set of SocketInterface objects with a poller instance via 
calling its Add method, and
+then call the Wait method of this poll object to wait for the registered 
SocketInterface objects to be ready
+for sending and receiving messages. The APIs of the poller class is shown 
below.
+
+    class Poller {
+     public:
+      Poller();
+      Poller(SocketInterface* socket);
+      /**
+        * Add a socket for polling; Multiple sockets can be polled together by
+        * adding them into the same poller.
+        */
+      void Add(SocketInterface* socket);
+      /**
+        * Poll for all sockets added into this poller.
+        * @param timeout Stop after this number of mseconds
+        * @return pointer To the socket if it has one message in the receiving
+        * queue; nullptr if no message in any sockets,
+        */
+      SocketInterface* Wait(int duration);
+
+      /**
+       * @return true if the poller is terminated due to process interupt
+       */
+      virtual bool Terminated();
+    };
+
+
+### Dealer Socket
+
+The Dealer socket inherits from the base Socket. In Singa, every Dealer socket
+only connects to one Router socket as shown in Fig.1.  The connection is set up
+by connecting the Dealer socket to the endpoint of a Router socket.
+
+    class Dealer : public SocketInterface {
+     public:
+      /*
+       * @param id Local dealer ID within a procs if the dealer is from worker 
or
+       * server thread, starts from 1 (0 is used by the router); or the 
connected
+       * remote procs ID for inter-process dealers from the stub thread.
+       */
+      Dealer();
+      explicit Dealer(int id);
+      ~Dealer() override;
+      /**
+        * Setup the connection with the router.
+        *
+        * @param endpoint Identifier of the router. For intra-process
+        * connection, the endpoint follows the format of ZeroMQ, i.e.,
+        * starting with "inproc://"; in Singa, since each process has one
+        * router, hence we can fix the endpoint to be "inproc://router" for
+        * intra-process. For inter-process, the endpoint follows ZeroMQ's
+        * format, i.e., IP:port, where IP is the connected process.
+        * @return 1 connection sets up successfully; 0 otherwise
+        */
+      int Connect(const std::string& endpoint);
+      int Send(Msg** msg) override;
+      Msg* Receive() override;
+      void* InternalID() const override;
+    };
+
+### Router Socket
+
+The Router socket inherits from the base Socket. One Router socket connects to
+at least one Dealer socket. Upon receiving a message, the router forwards it to
+the appropriate dealer according to the receiver's ID of this message.
+
+    class Router : public SocketInterface {
+     public:
+      Router();
+      /**
+       * There is only one router per procs, hence its local id is 0 and is 
not set
+       * explicitly.
+       *
+       * @param bufsize Buffer at most this number of messages
+       */
+      explicit Router(int bufsize);
+      ~Router() override;
+      /**
+       * Setup the connection with dealers.
+       *
+       * It automatically binds to the endpoint for intra-process 
communication,
+       * i.e., "inproc://router".
+       *
+       * @param endpoint The identifier for the Dealer socket in other process
+       * to connect. It has the format IP:Port, where IP is the host machine.
+       * If endpoint is empty, it means that all connections are
+       * intra-process connection.
+       * @return number of connected dealers.
+       */
+      int Bind(const std::string& endpoint);
+      /**
+       * If the destination socket has not connected yet, buffer this the 
message.
+       */
+      int Send(Msg** msg) override;
+      Msg* Receive() override;
+      void* InternalID() const override;
+
+    };
+
+## Implementation
+
+### ZeroMQ
+
+**Why [ZeroMQ](http://zeromq.org/)?** Our previous design used MPI for
+communication between Singa processes. But MPI is a poor choice when it comes
+to fault-tolerance, because failure at one node brings down the entire MPI
+cluster. ZeroMQ, on the other hand, is fault tolerant in the sense that one
+node failure does not affect the other nodes. ZeroMQ consists of several basic
+communication patterns that can be easily combined to create more complex
+network topologies.
+
+<img src="../images/msg-flow.png" style="width: 550px"/>
+<p><strong> Fig.3 - Messages flow for ZeroMQ</strong></p>
+
+The communication APIs of Singa are similar to the DEALER-ROUTER pattern of
+ZeroMQ. Hence we can easily implement the Dealer socket using ZeroMQ's DEALER
+socket, and Router socket using ZeroMQ's ROUTER socket.
+The intra-process can be implemented using ZeroMQ's inproc transport, and the
+inter-process can be implemented using the tcp transport (To exploit the
+Infiniband, we can use the sdp transport). Fig.3 shows the message flow using
+ZeroMQ as the underlying implementation. The messages sent from dealers has two
+frames for the message header, and one or more frames for the message content.
+The messages sent from routers have another frame for the identifier of the
+destination dealer.
+
+Besides the DEALER-ROUTER pattern, we may also implement the Dealer socket and
+Router socket using other ZeroMQ patterns. To be continued.
+
+### MPI
+
+Since MPI does not provide intra-process communication, we have to implement
+it inside the Router and Dealer socket. A simple solution is to allocate one
+message queue for each socket. Messages sent to one socket is inserted into the
+queue of that socket. We create a SafeQueue class to ensure the consistency of
+the queue. All queues are created by the main thread and
+passed to all sockets' constructor via *args*.
+
+    /**
+     * A thread safe queue class.
+     * There would be multiple threads pushing messages into
+     * the queue and only one thread reading and popping the queue.
+     */
+    class SafeQueue{
+     public:
+      void Push(Msg* msg);
+      Msg* Front();
+      void Pop();
+      bool empty();
+    };
+
+For inter-process communication, we serialize the message and call MPI's
+send/receive functions to transfer them. All inter-process connections are
+setup by MPI at the beginning. Consequently, the Connect and Bind functions do
+nothing for both inter-process and intra-process communication.
+
+MPI's AllReduce function is efficient for data aggregation in distributed
+training. For example, [DeepImage of Baidu](http://arxiv.org/abs/1501.02876)
+uses AllReduce to aggregate the updates of parameter from all workers. It has
+similar architecture as [Fig.2](architecture.html),
+where every process has a server group and is connected with all other 
processes.
+Hence, we can implement DeepImage in Singa by simply using MPI's AllReduce 
function for
+inter-process communication.

Added: incubator/singa/site/trunk/content/markdown/v0.3.0/data.md
URL: 
http://svn.apache.org/viewvc/incubator/singa/site/trunk/content/markdown/v0.3.0/data.md?rev=1740048&view=auto
==============================================================================
--- incubator/singa/site/trunk/content/markdown/v0.3.0/data.md (added)
+++ incubator/singa/site/trunk/content/markdown/v0.3.0/data.md Wed Apr 20 
05:09:06 2016
@@ -0,0 +1,98 @@
+# Data Preparation
+
+---
+
+SINGA uses input layers to load data.
+Users can store their data in any format (e.g., CSV or binary) and at any 
places
+(e.g., disk file or HDFS) as long as there are corresponding input layers that
+can read the data records and parse them.
+
+To make it easy for users, SINGA provides a [StoreInputLayer] to read data
+in the format of (string:key, string:value) tuples from a couple of sources.
+These sources are abstracted using a [Store]() class which is a simple version 
of
+the DB abstraction in Caffe. The base Store class provides the following 
operations
+for reading and writing tuples,
+
+    Open(string path, Mode mode); // open the store for kRead or kCreate or 
kAppend
+    Close();
+
+    Read(string* key, string* val); // read a tuple; return false if fail
+    Write(string key, string val);  // write a tuple
+    Flush();
+
+Currently, two implementations are provided, namely
+
+1. [KVFileStore] for storing tuples in [KVFile]() (a binary file).
+The *create_data.cc* files in *examples/cifar10* and *examples/mnist* provide
+examples of storing records using KVFileStore.
+
+2. [TextFileStore] for storing tuples in plain text file (one line per tuple).
+
+The (key, value) tuple are parsed by subclasses of StoreInputLayer depending 
on the
+format of the tuple,
+
+* [ProtoRecordInputLayer] parses the value field from one
+tuple into a [SingleLabelImageRecord], which is generated by Google Protobuf 
according
+to [common.proto]. It can be used to store features for images (e.g., using 
the pixel field)
+or other objects (using the data field). The key field is not used.
+
+* [CSVRecordInputLayer] parses one tuple as a CSV line (separated by comma).
+
+
+## Using built-in record format
+
+SingleLabelImageRecord is a built-in record in SINGA for storing image 
features.
+It is used in the cifar10 and mnist examples.
+
+    message SingleLabelImageRecord {
+      repeated int32 shape = 1;                // it obtains 3 (rgb channels), 
32 (row), 32 (col)
+      optional int32 label = 2;                // label
+      optional bytes pixel = 3;                // pixels
+      repeated float data = 4 [packed = true]; // it is used for normalization
+   }
+
+The data preparation instructions for the [CIFAR-10 image 
dataset](http://www.cs.toronto.edu/~kriz/cifar.html)
+will be elaborated here. This dataset consists of 60,000 32x32 color images in 
10 classes, with 6,000 images per class.
+There are 50,000 training images and 10,000 test images.
+Each image has a single label. This dataset is stored in binary files with 
specific format.
+SINGA comes with the 
[create_data.cc](https://github.com/apache/incubator-singa/blob/master/examples/cifar10/create_data.cc)
+to convert images in the binary files into `SingleLabelImageRecord`s and 
insert them into training and test stores.
+
+1. Download raw data. The following command will download the dataset into 
*cifar-10-batches-bin* folder.
+
+        # in SINGA_ROOT/examples/cifar10
+        $ cp Makefile.example Makefile   // an example makefile is provided
+        $ make download
+
+2. Fill one record for each image, and insert it to store.
+
+        KVFileStore store;
+        store.Open(output_file_path, singa::io::kCreate);
+
+        singa::SingleLabelImageRecord image;
+        for (int image_id = 0; image_id < 50000; image_id ++) {
+          // fill the record with image feature and label from downloaded 
binay files
+          string str;
+          image.SerializeToString(&str);
+          store.Write(to_string(image_id), str);
+        }
+        store.Flush();
+        store.Close();
+
+    The data store for testing data is created similarly.
+    In addition, it computes average values (not shown here) of image pixels 
and
+    insert the mean values into a SingleLabelImageRecord, which is then written
+    into a another store.
+
+3. Compile and run the program. SINGA provides an example Makefile that 
contains instructions
+    for compiling the source code and linking it with *libsinga.so*. Users 
just execute the following command.
+
+        $ make create
+
+## using user-defined record format
+
+If users cannot use the SingleLabelImageRecord or CSV record for their data.
+They can define their own record format e.g., using Google Protobuf.
+A record can be written into a data store as long as it can be converted
+into byte string. Correspondingly, subclasses of StoreInputLayer are required 
to
+parse user-defined records.

Added: incubator/singa/site/trunk/content/markdown/v0.3.0/debug.md
URL: 
http://svn.apache.org/viewvc/incubator/singa/site/trunk/content/markdown/v0.3.0/debug.md?rev=1740048&view=auto
==============================================================================
--- incubator/singa/site/trunk/content/markdown/v0.3.0/debug.md (added)
+++ incubator/singa/site/trunk/content/markdown/v0.3.0/debug.md Wed Apr 20 
05:09:06 2016
@@ -0,0 +1,29 @@
+# How to Debug
+
+---
+
+Since SINGA is developed on Linux using C++, GDB is the preferred debugging
+tool. To use GDB, the code must be compiled with `-g` flag. This is enabled by
+
+    ./configure --enable-debug
+    make
+
+## Debugging for single process job
+
+If your job launches only one process, then use the default *conf/singa.conf*
+for debugging. The process will be launched locally.
+
+To debug, first start zookeeper if it is not started yet, and launch GDB
+
+    # do this for only once
+    ./bin/zk-service.sh start
+    # do this every time
+    gdb .libs/singa
+
+Then set the command line arguments
+
+    set args -conf JOBCONF
+
+Now you can set your breakpoints and start running.
+
+## Debugging for jobs with multiple processes

Added: 
incubator/singa/site/trunk/content/markdown/v0.3.0/distributed-training.md
URL: 
http://svn.apache.org/viewvc/incubator/singa/site/trunk/content/markdown/v0.3.0/distributed-training.md?rev=1740048&view=auto
==============================================================================
--- incubator/singa/site/trunk/content/markdown/v0.3.0/distributed-training.md 
(added)
+++ incubator/singa/site/trunk/content/markdown/v0.3.0/distributed-training.md 
Wed Apr 20 05:09:06 2016
@@ -0,0 +1,25 @@
+# Distributed Training
+
+---
+
+SINGA is designed for distributed training of large deep learning models with 
huge amount of training data.
+We also provide high-level descriptions of design behind SINGA's distributed 
architecture. 
+
+* [System Architecture](architecture.html)
+
+* [Training Frameworks](frameworks.html)
+
+* [System Communication](communication.html)
+
+SINGA supports different options for training a model in parallel, includeing 
data parallelism, model parallelism and hybrid parallelism.
+
+* [Hybrid Parallelism](hybrid.html)
+
+SINGA is intergrated with Mesos, so that distributed training can be started 
as a Mesos framework. Currently, the Mesos cluster can be set up from SINGA 
containers, i.e. we provide Docker images that bundles Mesos and SINGA 
together. Refer to the guide below for instructions as how to start and use the 
cluster.
+
+* [Distributed training on Mesos](mesos.html)
+
+SINGA can run on top of distributed storage system to achieve scalability. The 
current version of SINGA supports HDFS.
+
+* [Running SINGA on HDFS](hdfs.html)
+

Added: incubator/singa/site/trunk/content/markdown/v0.3.0/docker.md
URL: 
http://svn.apache.org/viewvc/incubator/singa/site/trunk/content/markdown/v0.3.0/docker.md?rev=1740048&view=auto
==============================================================================
--- incubator/singa/site/trunk/content/markdown/v0.3.0/docker.md (added)
+++ incubator/singa/site/trunk/content/markdown/v0.3.0/docker.md Wed Apr 20 
05:09:06 2016
@@ -0,0 +1,232 @@
+# Building SINGA Docker container 
+ 
+This guide explains how to set up a development environment for SINGA using 
Docker. It requires only Docker to be installed. The resulting image contains 
the complete working environment for SINGA. The image can then be used to set 
up cluster environment over one or multiple physical nodes.  
+
+1. [Build SINGA base](#build_base)
+2. [Build GPU-enabled SINGA](#build_gpu)
+3. [Build SINGA with Mesos and Hadoop](#build_mesos)
+4. [Pre-built images](#pre_built)
+5. [Launch and stop SINGA (stand alone mode)](#launch_stand_alone)
+6. [Launch pseudo-distributed SINGA on one node](#launch_pseudo)
+7. [Launch fully distributed SINGA on multiple nodes](#launch_distributed)
+
+---
+
+<a name="build_base"></a>
+#### Build SINGA base image
+ 
+````
+$ cd $SINGA_HOME/..
+$ sudo docker build -t singa/base -f 
incubator-singa/tool/docker/singa/Dockerfile . 
+$ sudo docker images
+REPOSITORY             TAG                 IMAGE ID            CREATED         
    VIRTUAL SIZE
+singa/base             latest              XXXX                XXX             
    XXX GB
+````
+
+The result is the image containing a built version of SINGA. 
+
+   ![singa/base](http://www.comp.nus.edu.sg/~dinhtta/files/images_base.png)
+
+   *Figure 1. singa/base Docker image, containing library dependencies and 
SINGA built from source.*
+
+---
+
+<a name="build_gpu"></a>
+#### Build SINGA with GPU support 
+ 
+````
+$ cd $SINGA_HOME/..
+$ sudo docker build -t singa/gpu -f 
incubator-singa/tool/docker/singa/Dockerfile_gpu . 
+$ sudo docker images
+REPOSITORY             TAG                 IMAGE ID            CREATED         
    VIRTUAL SIZE
+singa/gpu             latest              XXXX                XXX              
   XXX GB
+````
+
+---
+
+<a name="build_mesos"></a>
+#### Build SINGA with Mesos and Hadoop
+````
+$ cd $SINGA_HOME/.. 
+$ sudo docker build -t singa/mesos -f 
incubator-singa/tool/docker/mesos/Dockerfile .
+$ sudo docker images
+REPOSITORY             TAG                 IMAGE ID            CREATED         
    VIRTUAL SIZE
+singa/mesos             latest              XXXX                XXX            
     XXX GB
+````
+   ![singa/mesos](http://www.comp.nus.edu.sg/~dinhtta/files/images_mesos.png#1)
+   
+   *Figure 2. singa/mesos Docker image, containing Hadoop and Mesos built on
+top of SINGA. The default namenode address for Hadoop is `node0:9000`*
+
+**Notes** A common failure observed during the build process is caused by 
network failure occuring when downloading dependencies. Simply re-run the build 
command. 
+
+---
+
+<a name="pre_built"></a>
+#### Pre-built images on epiC cluster
+For users with access to the `epiC` cluster, there are pre-built and loaded 
Docker images at the following nodes:
+
+      ciidaa-c18
+      ciidaa-c19
+
+The available images at those nodes are:
+
+````
+REPOSITORY             TAG                 IMAGE ID            CREATED         
    VIRTUAL SIZE
+singa/base             latest              XXXX                XXX             
    2.01 GB
+singa/mesos            latest              XXXX                XXX             
    4.935 GB
+weaveworks/weaveexec   1.1.1               XXXX                11 days ago     
    57.8 MB
+weaveworks/weave       1.1.1               XXXX                11 days ago     
    17.56 MB
+````
+
+---
+
+<a name="launch_stand_alone"></a>
+#### Launch and stop SINGA in stand-alone mode
+To launch a test environment for a single-node SINGA training, simply start a 
container from `singa/base` image. The following starts a container called
+`XYZ`, then launches a shell in the container: 
+
+````
+$ sudo docker run -dt --name XYZ singa/base /usr/bin/supervisord
+$ sudo docker exec -it XYZ /bin/bash
+````
+
+![Nothing](http://www.comp.nus.edu.sg/~dinhtta/files/images_standalone.png#1)
+
+   *Figure 3. Launch SINGA in stand-alone mode: single node training*
+
+Inside the launched container, the SINGA source directory can be found at 
`/root/incubator-singa`. 
+
+**Launching GPU-enabled container**
+First, make sure that the host GPUs are up and running. The list of `NVIDIA` 
devices should be listed at
+`/dev/nvidiaYYY`.
+
+Next, start a new container, passing it all the devices
+
+````
+$ sudo docker run -dt --device /dev/nvidiaYYY --device /dev/nvidiaYYY ... 
--name XYZ singa/gpu /usr/bin/supervisord
+$ sudo docker exec -it XYZ /bin/bash
+````
+
+**Stopping the container**
+
+````
+$ sudo docker stop XYZ
+$ sudo docker rm ZYZ
+````
+
+---
+
+<a name="launch_pseudo"></a>
+#### Launch SINGA on pseudo-distributed mode (single node)
+To simulate a distributed environment on a single node, one can repeat the
+previous step multiple times, each time giving a different name to the
+container.  Network connections between these containers are already supported,
+thus SINGA instances/nodes in these container can readily communicate with each
+other. 
+
+The previous approach requires the user to start SINGA instances individually
+at each container. Although there's a bash script for that, we provide a better
+way. In particular, multiple containers can be started from `singa/mesos` image
+which already bundles Mesos and Hadoop with SINGA. Using Mesos makes it easy to
+launch, stop and monitor the distributed execution from a single container.
+Figure 4 shows `N+1` containers running concurrently at the local host. 
+
+````
+$ sudo docker run -dt --name node0 singa/mesos /usr/bin/supervisord
+$ sudo docker run -dt --name node1 singa/mesos /usr/bin/supervisord
+...
+````
+
+![Nothing](http://www.comp.nus.edu.sg/~dinhtta/files/images_pseudo.png#1)
+   
+*Figure 4. Launch SINGA in pseudo-distributed mode : multiple SINGA nodes over 
one single machine*
+
+**Starting SINGA distributed training**
+
+Refer to the [Mesos
+guide](mesos.html)
+for details of how to start training with multiple SINGA instances. 
+
+**Important:** the container that assumes the role of Hadoop's namenode (and 
often Mesos's and Zookeeper's mater node as well) **must** be named `node0`. 
Otherwise, the user must log in to individual containers and change the Hadoop 
configuration separately. 
+
+**Notes on Docker version >=1.9** Newer version of Docker adopted a built-in 
DNS server at the deamon. As a consequence,
+name resolution inside containers now **cannot** depend on the automatically 
updated `/etc/hosts` files as in version
+1.8 and earlier. Here we recommend two ways to make pseudo-distributed and 
distributed SINGA containers work as before
+
+1. Downgrade to docker version 1.8 and earlier
+
+         $ sudo apt-get install docker-engine=1.8.3-0~trusty
+
+2. Manually log in to each running container, by `sudo exec -it <name> 
/bin/bash`, and edit the `/etc/hosts` with the
+assigned IP addresses of all other running containers. 
+
+         node0 <ip0>
+         node1 <ip1>
+         ...
+
+---
+
+<a name="launch_distributed"></a>
+#### Launch SINGA on fully distributed mode (multiple nodes)
+The previous section has explained how to start a distributed environment on a
+single node. But running many containers on one node does not scale. When there
+are multiple physical hosts available, it is better to distribute the
+containers over them. 
+
+The only extra requirement for the fully distributed mode, as compared with the
+pseudo distributed mode, is that the containers from different hosts are able
+to transparently communicate with each other. In the pseudo distributed mode,
+the local docker engine takes care of such communication. Here, we rely on
+[Weave](http://weave.works/guides/weave-docker-ubuntu-simple.html) to make the
+communication transparent. The resulting architecture is shown below.  
+
+![Nothing](http://www.comp.nus.edu.sg/~dinhtta/files/images_full.png#1)
+   
+*Figure 5. Launch SINGA in fully distributed mode: multiple SINGA nodes over 
multiple machines*
+
+**Install Weave at all hosts**
+
+```
+$ curl -L git.io/weave -o /usr/local/bin/weave
+$ chmod a+x /usr/local/bin/weave
+```
+
+**Starting Weave**
+
+Suppose `node0` will be launched at host with IP `111.222.111.222`.
+
++ At host `111.222.111.222`:
+
+          $ weave launch
+          $ eval "$(weave env)"  //if there's error, do `sudo -s` and try again
+
++ At other hosts:
+
+          $ weave launch 111.222.111.222
+          $ eval "$(weave env)" //if there's error, do `sudo -s` and try again
+
+**Starting containers**
+
+The user logs in to each host and starts the container (same as in 
[pseudo-distributed](#launch_pseudo) mode). Note that container acting as the 
head node of the cluster must be named `node0` (and be running at the host with 
IP `111.222.111.222`, for example). 
+
+**_Important_:** when there are other containers sharing the same host as 
`node0`, say `node1` and `node2` for example,
+there're additional changes to be made to `node1` and `node2`. Particularly, 
log in to each container and edit
+`/etc/hosts` file:
+
+````
+# modified by weave
+...
+X.Y.Z  node0 node0.bridge  //<- REMOVE this line
+..
+````
+This is to ensure that name resolutions (of `node0`'s address) from `node1` 
and `node2` are correct. By default,
+containers of the same host resolves each other's addresses via the Docker 
bridge. Instead, we want they to use
+addressed given by Weave.  
+
+
+**Starting SINGA distributed training**
+
+Refer to the [Mesos guide](mesos.html)
+for details of how to start training with multiple SINGA instances. 
+

Added: incubator/singa/site/trunk/content/markdown/v0.3.0/examples.md
URL: 
http://svn.apache.org/viewvc/incubator/singa/site/trunk/content/markdown/v0.3.0/examples.md?rev=1740048&view=auto
==============================================================================
--- incubator/singa/site/trunk/content/markdown/v0.3.0/examples.md (added)
+++ incubator/singa/site/trunk/content/markdown/v0.3.0/examples.md Wed Apr 20 
05:09:06 2016
@@ -0,0 +1,29 @@
+# Example Models
+
+---
+
+Different models are provided as examples to help users get familiar with 
SINGA.
+[Neural Network](neural-net.html) gives details on the models that are
+supported by SINGA.
+
+
+### Feed-forward neural networks
+
+  * [MultiLayer Perceptron](mlp.html) trained on MNIST dataset for handwritten
+  digits recognition.
+
+  * [Convolutional Neural Network](cnn.html) trained on MNIST and CIFAR10 for
+  image classification.
+
+  * [Deep Auto-Encoders](rbm.html) trained on MNIST for dimensionality
+
+
+### Recurrent neural networks (RNN)
+
+ * [RNN language model](rnn.html) trained on plain text for language modelling.
+
+### Energy models
+
+ * [RBM](rbm.html) used to pre-train deep auto-encoders for dimensionality
+ reduction.
+

Added: incubator/singa/site/trunk/content/markdown/v0.3.0/frameworks.md
URL: 
http://svn.apache.org/viewvc/incubator/singa/site/trunk/content/markdown/v0.3.0/frameworks.md?rev=1740048&view=auto
==============================================================================
--- incubator/singa/site/trunk/content/markdown/v0.3.0/frameworks.md (added)
+++ incubator/singa/site/trunk/content/markdown/v0.3.0/frameworks.md Wed Apr 20 
05:09:06 2016
@@ -0,0 +1,122 @@
+# Distributed Training Framework
+
+---
+
+## Cluster Topology Configuration
+
+Here we describe how to configure SINGA's cluster topology to support
+different distributed training frameworks.
+The cluster topology is configured in the `cluster` field in `JobProto`.
+The `cluster` is of type `ClusterProto`:
+
+    message ClusterProto {
+      optional int32 nworker_groups = 1;
+      optional int32 nserver_groups = 2;
+      optional int32 nworkers_per_group = 3 [default = 1];
+      optional int32 nservers_per_group = 4 [default = 1];
+      optional int32 nworkers_per_procs = 5 [default = 1];
+      optional int32 nservers_per_procs = 6 [default = 1];
+
+      // servers and workers in different processes?
+      optional bool server_worker_separate = 20 [default = false];
+
+      ......
+    }
+
+
+The mostly used fields are as follows:
+
+  * `nworkers_per_group` and `nworkers_per_procs`:
+  decide the partitioning of worker side ParamShard.
+  * `nservers_per_group` and `nservers_per_procs`:
+  decide the partitioning of server side ParamShard.
+  * `server_worker_separate`:
+  separate servers and workers in different processes.
+
+## Different Training Frameworks
+
+In SINGA, worker groups run asynchronously and
+workers within one group run synchronously.
+Users can leverage this general design to run
+both **synchronous** and **asynchronous** training frameworks.
+Here we illustrate how to configure
+popular distributed training frameworks in SINGA.
+
+<img src="../images/frameworks.png" style="width: 800px"/>
+<p><strong> Fig.1 - Training frameworks in SINGA</strong></p>
+
+###Sandblaster
+
+This is a **synchronous** framework used by Google Brain.
+Fig.2(a) shows the Sandblaster framework implemented in SINGA.
+Its configuration is as follows:
+
+    cluster {
+        nworker_groups: 1
+        nserver_groups: 1
+        nworkers_per_group: 3
+        nservers_per_group: 2
+        server_worker_separate: true
+    }
+
+A single server group is launched to handle all requests from workers.
+A worker computes on its partition of the model,
+and only communicates with servers handling related parameters.
+
+
+###AllReduce
+
+This is a **synchronous** framework used by Baidu's DeepImage.
+Fig.2(b) shows the AllReduce framework implemented in SINGA.
+Its configuration is as follows:
+
+    cluster {
+        nworker_groups: 1
+        nserver_groups: 1
+        nworkers_per_group: 3
+        nservers_per_group: 3
+        server_worker_separate: false
+    }
+
+We bind each worker with a server on the same node, so that each
+node is responsible for maintaining a partition of parameters and
+collecting updates from all other nodes.
+
+###Downpour
+
+This is a **asynchronous** framework used by Google Brain.
+Fig.2(c) shows the Downpour framework implemented in SINGA.
+Its configuration is as follows:
+
+    cluster {
+        nworker_groups: 2
+        nserver_groups: 1
+        nworkers_per_group: 2
+        nservers_per_group: 2
+        server_worker_separate: true
+    }
+
+Similar to the synchronous Sandblaster, all workers send
+requests to a global server group. We divide workers into several
+worker groups, each running independently and working on parameters
+from the last *update* response.
+
+###Distributed Hogwild
+
+This is a **asynchronous** framework used by Caffe.
+Fig.2(d) shows the Distributed Hogwild framework implemented in SINGA.
+Its configuration is as follows:
+
+    cluster {
+        nworker_groups: 3
+        nserver_groups: 3
+        nworkers_per_group: 1
+        nservers_per_group: 1
+        server_worker_separate: false
+    }
+
+Each node contains a complete server group and a complete worker group.
+Parameter updates are done locally, so that communication cost
+during each training step is minimized.
+However, the server group must periodically synchronize with
+neighboring groups to improve the training convergence.

svn commit: r1740048 [1/10] - in /incubator/singa/site/trunk/content/markdown: ./ develop/ docs/ docs/kr/ v0.3.0/ v0.3.0/jp/ v0.3.0/kr/ v0.3.0/zh/

Reply via email to