svn commit: r1740048 [10/10] - in /incubator/singa/site/trunk/content/markdown: ./ develop/ docs/ docs/kr/ v0.3.0/ v0.3.0/jp/ v0.3.0/kr/ v0.3.0/zh/

wangwei Tue, 19 Apr 2016 22:09:47 -0700

Added: incubator/singa/site/trunk/content/markdown/v0.3.0/zh/neural-net.md
URL: 
http://svn.apache.org/viewvc/incubator/singa/site/trunk/content/markdown/v0.3.0/zh/neural-net.md?rev=1740048&view=auto
==============================================================================
--- incubator/singa/site/trunk/content/markdown/v0.3.0/zh/neural-net.md (added)
+++ incubator/singa/site/trunk/content/markdown/v0.3.0/zh/neural-net.md Wed Apr 
20 05:09:06 2016
@@ -0,0 +1,327 @@
+# Neural Net
+
+---
+
+`NeuralNet` in SINGA represents an instance of user's neural net model. As the
+neural net typically consists of a set of layers, `NeuralNet` comprises
+a set of unidirectionally connected [Layer](layer.html)s.
+This page describes how to convert an user's neural net into
+the configuration of `NeuralNet`.
+
+<img src="../images/model-category.png" align="center" width="200px"/>
+<span><strong>Figure 1 - Categorization of popular deep learning 
models.</strong></span>
+
+## Net structure configuration
+
+Users configure the `NeuralNet` by listing all layers of the neural net and
+specifying each layer's source layer names. Popular deep learning models can be
+categorized as Figure 1. The subsequent sections give details for each
+category.
+
+### Feed-forward models
+
+<div align = "left">
+<img src="../images/mlp-net.png" align="center" width="200px"/>
+<span><strong>Figure 2 - Net structure of a MLP model.</strong></span>
+</div>
+
+Feed-forward models, e.g., CNN and MLP, can easily get configured as their 
layer
+connections are undirected without circles. The
+configuration for the MLP model shown in Figure 1 is as follows,
+
+    net {
+      layer {
+        name : 'data"
+        type : kData
+      }
+      layer {
+        name : 'image"
+        type : kImage
+        srclayer: 'data'
+      }
+      layer {
+        name : 'label"
+        type : kLabel
+        srclayer: 'data'
+      }
+      layer {
+        name : 'hidden"
+        type : kHidden
+        srclayer: 'image'
+      }
+      layer {
+        name : 'softmax"
+        type : kSoftmaxLoss
+        srclayer: 'hidden'
+        srclayer: 'label'
+      }
+    }
+
+### Energy models
+
+<img src="../images/rbm-rnn.png" align="center" width="500px"/>
+<span><strong>Figure 3 - Convert connections in RBM and RNN.</strong></span>
+
+
+For energy models including RBM, DBM,
+etc., their connections are undirected (i.e., Category B). To represent these 
models using
+`NeuralNet`, users can simply replace each connection with two directed
+connections, as shown in Figure 3a. In other words, for each pair of connected 
layers, their source
+layer field should include each other's name.
+The full [RBM example](rbm.html) has
+detailed neural net configuration for a RBM model, which looks like
+
+    net {
+      layer {
+        name : "vis"
+        type : kVisLayer
+        param {
+          name : "w1"
+        }
+        srclayer: "hid"
+      }
+      layer {
+        name : "hid"
+        type : kHidLayer
+        param {
+          name : "w2"
+          share_from: "w1"
+        }
+        srclayer: "vis"
+      }
+    }
+
+### RNN models
+
+For recurrent neural networks (RNN), users can remove the recurrent connections
+by unrolling the recurrent layer.  For example, in Figure 3b, the original
+layer is unrolled into a new layer with 4 internal layers. In this way, the
+model is like a normal feed-forward model, thus can be configured similarly.
+The [RNN example](rnn.html) has a full neural net
+configuration for a RNN model.
+
+
+## Configuration for multiple nets
+
+Typically, a training job includes three neural nets for
+training, validation and test phase respectively. The three neural nets share 
most
+layers except the data layer, loss layer or output layer, etc..  To avoid
+redundant configurations for the shared layers, users can uses the `exclude`
+filed to filter a layer in the neural net, e.g., the following layer will be
+filtered when creating the testing `NeuralNet`.
+
+
+    layer {
+      ...
+      exclude : kTest # filter this layer for creating test net
+    }
+
+
+
+## Neural net partitioning
+
+A neural net can be partitioned in different ways to distribute the training
+over multiple workers.
+
+### Batch and feature dimension
+
+<img src="../images/partition_fc.png" align="center" width="400px"/>
+<span><strong>Figure 4 - Partitioning of a fully connected 
layer.</strong></span>
+
+
+Every layer's feature blob is considered a matrix whose rows are feature
+vectors. Thus, one layer can be split on two dimensions. Partitioning on
+dimension 0 (also called batch dimension) slices the feature matrix by rows.
+For instance, if the mini-batch size is 256 and the layer is partitioned into 2
+sub-layers, each sub-layer would have 128 feature vectors in its feature blob.
+Partitioning on this dimension has no effect on the parameters, as every
+[Param](param.html) object is replicated in the sub-layers. Partitioning on 
dimension
+1 (also called feature dimension) slices the feature matrix by columns. For
+example, suppose the original feature vector has 50 units, after partitioning
+into 2 sub-layers, each sub-layer would have 25 units. This partitioning may
+result in [Param](param.html) object being split, as shown in
+Figure 4. Both the bias vector and weight matrix are
+partitioned into two sub-layers.
+
+
+### Partitioning configuration
+
+There are 4 partitioning schemes, whose configurations are give below,
+
+  1. Partitioning each singe layer into sub-layers on batch dimension (see
+  below). It is enabled by configuring the partition dimension of the layer to
+  0, e.g.,
+
+          # with other fields omitted
+          layer {
+            partition_dim: 0
+          }
+
+  2. Partitioning each singe layer into sub-layers on feature dimension (see
+  below).  It is enabled by configuring the partition dimension of the layer to
+  1, e.g.,
+
+          # with other fields omitted
+          layer {
+            partition_dim: 1
+          }
+
+  3. Partitioning all layers into different subsets. It is enabled by
+  configuring the location ID of a layer, e.g.,
+
+          # with other fields omitted
+          layer {
+            location: 1
+          }
+          layer {
+            location: 0
+          }
+
+
+  4. Hybrid partitioning of strategy 1, 2 and 3. The hybrid partitioning is
+  useful for large models. An example application is to implement the
+  [idea proposed by Alex](http://arxiv.org/abs/1404.5997).
+  Hybrid partitioning is configured like,
+
+          # with other fields omitted
+          layer {
+            location: 1
+          }
+          layer {
+            location: 0
+          }
+          layer {
+            partition_dim: 0
+            location: 0
+          }
+          layer {
+            partition_dim: 1
+            location: 0
+          }
+
+Currently SINGA supports strategy-2 well. Other partitioning strategies are
+are under test and will be released in later version.
+
+## Parameter sharing
+
+Parameters can be shared in two cases,
+
+  * sharing parameters among layers via user configuration. For example, the
+  visible layer and hidden layer of a RBM shares the weight matrix, which is 
configured through
+  the `share_from` field as shown in the above RBM configuration. The
+  configurations must be the same (except name) for shared parameters.
+
+  * due to neural net partitioning, some `Param` objects are replicated into
+  different workers, e.g., partitioning one layer on batch dimension. These
+  workers share parameter values. SINGA controls this kind of parameter
+  sharing automatically, users do not need to do any configuration.
+
+  * the `NeuralNet` for training and testing (and validation) share most layers
+  , thus share `Param` values.
+
+If the shared `Param` instances resident in the same process (may in different
+threads), they use the same chunk of memory space for their values. But they
+would have different memory spaces for their gradients. In fact, their
+gradients will be averaged by the stub or server.
+
+## Advanced user guide
+
+### Creation
+
+    static NeuralNet* NeuralNet::Create(const NetProto& np, Phase phase, int 
num);
+
+The above function creates a `NeuralNet` for a given phase, and returns a
+pointer to the `NeuralNet` instance. The phase is in {kTrain,
+kValidation, kTest}. `num` is used for net partitioning which indicates the
+number of partitions.  Typically, a training job includes three neural nets for
+training, validation and test phase respectively. The three neural nets share 
most
+layers except the data layer, loss layer or output layer, etc.. The `Create`
+function takes in the full net configuration including layers for training,
+validation and test.  It removes layers for phases other than the specified
+phase based on the `exclude` field in
+[layer configuration](layer.html):
+
+    layer {
+      ...
+      exclude : kTest # filter this layer for creating test net
+    }
+
+The filtered net configuration is passed to the constructor of `NeuralNet`:
+
+    NeuralNet::NeuralNet(NetProto netproto, int npartitions);
+
+The constructor creates a graph representing the net structure firstly in
+
+    Graph* NeuralNet::CreateGraph(const NetProto& netproto, int npartitions);
+
+Next, it creates a layer for each node and connects layers if their nodes are
+connected.
+
+    void NeuralNet::CreateNetFromGraph(Graph* graph, int npartitions);
+
+Since the `NeuralNet` instance may be shared among multiple workers, the
+`Create` function returns a pointer to the `NeuralNet` instance .
+
+### Parameter sharing
+
+ `Param` sharing
+is enabled by first sharing the Param configuration (in `NeuralNet::Create`)
+to create two similar (e.g., the same shape) Param objects, and then calling
+(in `NeuralNet::CreateNetFromGraph`),
+
+    void Param::ShareFrom(const Param& from);
+
+It is also possible to share `Param`s of two nets, e.g., sharing parameters of
+the training net and the test net,
+
+    void NeuralNet:ShareParamsFrom(NeuralNet* other);
+
+It will call `Param::ShareFrom` for each Param object.
+
+### Access functions
+`NeuralNet` provides a couple of access function to get the layers and params
+of the net:
+
+    const std::vector<Layer*>& layers() const;
+    const std::vector<Param*>& params() const ;
+    Layer* name2layer(string name) const;
+    Param* paramid2param(int id) const;
+
+
+### Partitioning
+
+
+#### Implementation
+
+SINGA partitions the neural net in `CreateGraph` function, which creates one
+node for each (partitioned) layer. For example, if one layer's partition
+dimension is 0 or 1, then it creates `npartition` nodes for it; if the
+partition dimension is -1, a single node is created, i.e., no partitioning.
+Each node is assigned a partition (or location) ID. If the original layer is
+configured with a location ID, then the ID is assigned to each newly created 
node.
+These nodes are connected according to the connections of the original layers.
+Some connection layers will be added automatically.
+For instance, if two connected sub-layers are located at two
+different workers, then a pair of bridge layers is inserted to transfer the
+feature (and gradient) blob between them. When two layers are partitioned on
+different dimensions, a concatenation layer which concatenates feature rows (or
+columns) and a slice layer which slices feature rows (or columns) would be
+inserted. These connection layers help making the network communication and
+synchronization transparent to the users.
+
+#### Dispatching partitions to workers
+
+Each (partitioned) layer is assigned a location ID, based on which it is 
dispatched to one
+worker. Particularly, the pointer to the `NeuralNet` instance is passed
+to every worker within the same group, but each worker only computes over the
+layers that have the same partition (or location) ID as the worker's ID.  When
+every worker computes the gradients of the entire model parameters
+(strategy-2), we refer to this process as data parallelism.  When different
+workers compute the gradients of different parameters (strategy-3 or
+strategy-1), we call this process model parallelism.  The hybrid partitioning
+leads to hybrid parallelism where some workers compute the gradients of the
+same subset of model parameters while other workers compute on different model
+parameters.  For example, to implement the hybrid parallelism in for the
+[DCNN model](http://arxiv.org/abs/1404.5997), we set `partition_dim = 0` for
+lower layers and `partition_dim = 1` for higher layers.
+


Added: incubator/singa/site/trunk/content/markdown/v0.3.0/zh/overview.md
URL: 
http://svn.apache.org/viewvc/incubator/singa/site/trunk/content/markdown/v0.3.0/zh/overview.md?rev=1740048&view=auto
==============================================================================
Binary file - no diff available.

Propchange: incubator/singa/site/trunk/content/markdown/v0.3.0/zh/overview.md
------------------------------------------------------------------------------
    svn:mime-type = application/octet-stream

Added: 
incubator/singa/site/trunk/content/markdown/v0.3.0/zh/programming-guide.md
URL: 
http://svn.apache.org/viewvc/incubator/singa/site/trunk/content/markdown/v0.3.0/zh/programming-guide.md?rev=1740048&view=auto
==============================================================================
--- incubator/singa/site/trunk/content/markdown/v0.3.0/zh/programming-guide.md 
(added)
+++ incubator/singa/site/trunk/content/markdown/v0.3.0/zh/programming-guide.md 
Wed Apr 20 05:09:06 2016
@@ -0,0 +1,67 @@
+# ç¼ç¨æå
+
+---
+
+è¦æäº¤ä¸ä¸ªè®ç»ä½ä¸ï¼ç¨æ·éè¦æä¾å¾1ä¸çåä¸ªé¨åçé
ç½®ï¼
+
+  * [NeuralNet](neural-net.html) ï¼æè¿°ç¥ç»ç½ç»ç»æï¼å
æ¬æ¯å±çå·ä½è®¾ç½®åå±ä¸å±çè¿æ¥å³ç³»ï¼
+  * [TrainOneBatch](train-one-batch.html) ï¼è¯¥ç®æ³éè¦æ 
¹æ®ä¸åçæ¨¡åç±»å«èå®å¶;
+  * [Updater](updater.html) ï¼å®ä¹æå¡å¨ç«¯æ´æ°åæ°çåè®®ï¼
+  * [Cluster Topology](distributed-training.html) 
ï¼æå®æå¡å¨åå·¥ä½èçåå¸å¼æææ¶æã
+
+*åçº§ç¨æ·æå* å°ä»ç»å¦ä½å©ç¨å
å»ºå±æäº¤ä¸ä¸ªè®ç»ä½ä¸ï¼è *é«çº§ç¨æ·æå* 
å°è¯¦ç»ä»ç»å¦ä½ç¼åç¨æ·èªå·±çä¸»å½æ°å¹¶æ³¨åèªå·±å®ç°çç»ä»¶ãæ¤å¤ï¼é«çº§ç¨æ·ååçº§ç¨æ·å¯¹è®ç»æ°æ®éç[å¤ç](data.html)æ¹å¼æ¯ç¸åçã
+
+<img src="../../images/overview.png" align="center" width="400px"/>
+<span><strong>å¾ 1 - SINGA æ¦è§</strong></span>
+
+
+
+## åçº§ç¨æ·æå
+
+ç¨æ·å¯ä»¥ä½¿ç¨SINGAæä¾çä¸»å½æ°æäº¤è®ç»ä½ä¸ãå¯¹äºè¿ç§æ
åµï¼ç¨æ·å¿é¡»å¨å½ä»¤è¡ä¸æä¾æ ¹æ® 
[JobProto](../api/classsinga_1_1JobProto.html) è®¾ç½®çä½ä¸éç½®æä»¶ï¼
+
+    ./bin/singa-run.sh -conf <path to job conf> [-resume]
+
+`-resume` 
è¡¨ç¤ºä»ä¸æ¬¡ç[æ£æ¥ç¹ï¼checkpointï¼](checkpoint.html)ç»§ç»è®ç»ã
+[MLP](mlp.html) æ¨¡åå [CNN](cnn.html) æ¨¡åä½¿ç¨å
å»ºå±æäº¤è®ç»ä½ä¸ãè¯·éè¯»ç¸å³é¡µé¢ï¼æ¥çå®ä»¬çä½ä¸é
ç½®æä»¶ï¼è¿äºé¡µé¢ä¼ä»ç»æ¯ä¸ªç»ä»¶éç½®çç»èã
+
+## é«çº§ç¨æ·æå
+
+å¦æç¨æ·çæ¨¡åä¸å
å«ä¸äºèªå·±å®ä¹çç»ä»¶ï¼æ¯å¦[Updater](updater.html)ï¼ç¨æ·å¿
é¡»èªå·±ç¼åä¸»å½æ°æ³¨åè¿äºç»ä»¶ï¼è·Hadoopçä¸»å½æ°ç±»ä¼¼ãä¸è¬å°ï¼ä¸»å½æ°åºè¯¥
+
+  * åå§åSINGAï¼å¦ï¼è®¾ç½®æ¥å¿ï¼
+  * æ³¨åç¨æ·èªå®ä¹ç»ä»¶ï¼
+  * åå»ºä½ä¸éç½®å¹¶ä¼ éç»SINGA driverã
+
+ä¸»å½æ°ç¤ºä¾
+
+    #include "singa.h"
+    #include "user.h"  // header for user code
+
+    int main(int argc, char** argv) {
+      singa::Driver driver;
+      driver.Init(argc, argv);
+      bool resume;
+      // parse resume option from argv.
+
+      // register user defined layers
+      driver.RegisterLayer<FooLayer>(kFooLayer);
+      // register user defined updater
+      driver.RegisterUpdater<FooUpdater>(kFooUpdater);
+      ...
+      auto jobConf = driver.job_conf();
+      //  update jobConf
+
+      driver.Train(resume, jobConf);
+      return 0;
+    }
+
+driver ç±»' `Init` æ¹æ³å è½½ç¨æ·å¨å½ä»¤è¡åæ°ä¸ ï¼`-conf <job 
conf>`ï¼æä¾çä½ä¸éç½®æä»¶ï¼è³å°å
å«éç¾¤ææç»æï¼ï¼å¹¶è¿å`jobConf`ç»ç¨æ·ï¼ç¨æ·å¯æ´æ°åæ·»å
 ç¥ç»ç½ç»æèUpdaterçé
ç½®ãå¦æå®ä¹äºLayerãUpdaterãWorkeræè
Paramçåç±»ï¼ç¨æ·éè¦éè¿driverä¸ºå®ä»¬æ³¨åãæåï¼ä½ä¸é
ç½®ä¼è¢«æäº¤å°driverï¼ç±driverå¯å¨è®ç»ã
+
+å°æ¥æä»¬ä¼æä¾ç±»ä¼¼[keras](https://github.com/fchollet/keras) 
çå¸®å©å·¥å·ï¼ä½¿ä½ä¸éç½®æ´å ç®åã
+
+ç¨æ·éè¦ä½¿ç¨SINGAåº(*.libs/libsinga.so*)ç¼è¯åé¾æ¥èªå·±çä»£ç 
ï¼å¦ï¼layerçå®ç°åä¸»å½æ°ï¼ï¼å¾å°å¯æ§è¡æä»¶ï¼å¦åä¸º*mysinga*
 çæä»¶ãæ§è¡ä»¥ä¸å½ä»¤å¯å¨è¯¥ç¨åºï¼ç¨æ·éè¦å°*mysinga* 
åä½ä¸éç½®æä»¶çè·¯å¾ä¼ ç» *./bin/singa-run.sh* ã
+
+    ./bin/singa-run.sh -conf <path to job conf> -exec <path to mysinga> [other 
arguments]
+
+[RNN application](rnn.html) 
æä¾äºä¸ä¸ªå®æ´çå®ç°ä¸»å½æ°è®ç»ç¹å®RNNæ¨¡åçä¾åã

Added: incubator/singa/site/trunk/content/markdown/v0.3.0/zh/rnn.md
URL: 
http://svn.apache.org/viewvc/incubator/singa/site/trunk/content/markdown/v0.3.0/zh/rnn.md?rev=1740048&view=auto
==============================================================================
--- incubator/singa/site/trunk/content/markdown/v0.3.0/zh/rnn.md (added)
+++ incubator/singa/site/trunk/content/markdown/v0.3.0/zh/rnn.md Wed Apr 20 
05:09:06 2016
@@ -0,0 +1,420 @@
+# Recurrent Neural Networks for Language Modelling
+
+---
+
+Recurrent Neural Networks (RNN) are widely used for modelling sequential data,
+such as music and sentences.  In this example, we use SINGA to train a
+[RNN 
model](http://www.fit.vutbr.cz/research/groups/speech/publi/2010/mikolov_interspeech2010_IS100722.pdf)
+proposed by Tomas Mikolov for [language 
modeling](https://en.wikipedia.org/wiki/Language_model).
+The training objective (loss) is
+to minimize the [perplexity per 
word](https://en.wikipedia.org/wiki/Perplexity), which
+is equivalent to maximize the probability of predicting the next word given 
the current word in
+a sentence.
+
+Different to the [CNN](cnn.html), [MLP](mlp.html)
+and [RBM](rbm.html) examples which use built-in
+layers(layer) and records(data),
+none of the layers in this example are built-in. Hence users would learn to
+implement their own layers and data records through this example.
+
+## Running instructions
+
+In *SINGA_ROOT/examples/rnnlm/*, scripts are provided to run the training job.
+First, the data is prepared by
+
+    $ cp Makefile.example Makefile
+    $ make download
+    $ make create
+
+Second, to compile the source code under *examples/rnnlm/*, run
+
+    $ make rnnlm
+
+An executable file *rnnlm.bin* will be generated.
+
+Third, the training is started by passing *rnnlm.bin* and the job configuration
+to *singa-run.sh*,
+
+    # at SINGA_ROOT/
+    # export LD_LIBRARY_PATH=.libs:$LD_LIBRARY_PATH
+    $ ./bin/singa-run.sh -exec examples/rnnlm/rnnlm.bin -conf 
examples/rnnlm/job.conf
+
+## Implementations
+
+<img src="../images/rnnlm.png" align="center" width="400px"/>
+<span><strong>Figure 1 - Net structure of the RNN model.</strong></span>
+
+The neural net structure is shown Figure 1.  Word records are loaded by
+`DataLayer`. For every iteration, at most `max_window` word records are
+processed. If a sentence ending character is read, the `DataLayer` stops
+loading immediately. `EmbeddingLayer` looks up a word embedding matrix to 
extract
+feature vectors for words loaded by the `DataLayer`.  These features are 
transformed by the
+`HiddenLayer` which propagates the features from left to right. The
+output feature for word at position k is influenced by words from position 0 to
+k-1.  Finally, `LossLayer` computes the cross-entropy loss (see below)
+by predicting the next word of each word.
+The cross-entropy loss is computed as
+
+`$$L(w_t)=-log P(w_{t+1}|w_t)$$`
+
+Given `$w_t$` the above equation would compute over all words in the 
vocabulary,
+which is time consuming.
+[RNNLM 
Toolkit](https://f25ea9ccb7d3346ce6891573d543960492b92c30.googledrive.com/host/0ByxdPXuxLPS5RFM5dVNvWVhTd0U/rnnlm-0.4b.tgz)
+accelerates the computation as
+
+`$$P(w_{t+1}|w_t) = P(C_{w_{t+1}}|w_t) * P(w_{t+1}|C_{w_{t+1}})$$`
+
+Words from the vocabulary are partitioned into a user-defined number of 
classes.
+The first term on the left side predicts the class of the next word, and
+then predicts the next word given its class. Both the number of classes and
+the words from one class are much smaller than the vocabulary size. The 
probabilities
+can be calculated much faster.
+
+The perplexity per word is computed by,
+
+`$$PPL = 10^{- avg_t log_{10} P(w_{t+1}|w_t)}$$`
+
+### Data preparation
+
+We use a small dataset provided by the [RNNLM 
Toolkit](https://f25ea9ccb7d3346ce6891573d543960492b92c30.googledrive.com/host/0ByxdPXuxLPS5RFM5dVNvWVhTd0U/rnnlm-0.4b.tgz).
+It has 10,000 training sentences, with 71350 words in total and 3720 unique 
words.
+The subsequent steps follow the instructions in
+[Data Preparation](data.html) to convert the
+raw data into records and insert them into data stores.
+
+#### Download source data
+
+    # in SINGA_ROOT/examples/rnnlm/
+    cp Makefile.example Makefile
+    make download
+
+#### Define record format
+
+We define the word record as follows,
+
+    # in SINGA_ROOT/examples/rnnlm/rnnlm.proto
+    message WordRecord {
+      optional string word = 1;
+      optional int32 word_index = 2;
+      optional int32 class_index = 3;
+      optional int32 class_start = 4;
+      optional int32 class_end = 5;
+    }
+
+It includes the word string and its index in the vocabulary.
+Words in the vocabulary are sorted based on their frequency in the training 
dataset.
+The sorted list is cut into 100 sublists such that each sublist has 1/100 total
+word frequency. Each sublist is called a class.
+Hence each word has a `class_index` ([0,100)). The `class_start` is the index
+of the first word in the same class as `word`. The `class_end` is the index of
+the first word in the next class.
+
+#### Create data stores
+
+We use code from RNNLM Toolkit to read words, and sort them into classes.
+The main function in *create_store.cc* first creates word classes based on the 
training
+dataset. Second it calls the following function to create data store for the
+training, validation and test dataset.
+
+    int create_data(const char *input_file, const char *output_file);
+
+`input` is the path to training/validation/testing text file from the RNNLM 
Toolkit, `output` is output store file.
+This function starts with
+
+    singa::io::KVFile store;
+    store.Open(output, signa::io::kCreate);
+
+Then it reads the words one by one. For each word it creates a `WordRecord` 
instance,
+and inserts it into the store,
+
+    int wcnt = 0; // word count
+    WordRecord  wordRecord;
+    while(1) {
+      readWord(wordstr, fin);
+      if (feof(fin)) break;
+      ...// fill in the wordRecord;
+      string val;
+      wordRecord.SerializeToString(&val);
+      int length = snprintf(key, BUFFER_LEN, "%05d", wcnt++);
+      store.Write(string(key, length), val);
+    }
+
+Compilation and running commands are provided in the *Makefile.example*.
+After executing
+
+    make create
+
+*train_data.bin*, *test_data.bin* and *valid_data.bin* will be created.
+
+
+### Layer implementation
+
+4 user-defined layers are implemented for this application.
+Following the guide for implementing [new Layer 
subclasses](layer#implementing-a-new-layer-subclass),
+we extend the [LayerProto](../api/classsinga_1_1LayerProto.html)
+to include the configuration messages of user-defined layers as shown below
+(3 out of the 7 layers have specific configurations),
+
+
+    import "job.proto";     // Layer message for SINGA is defined
+
+    //For implementation of RNNLM application
+    extend singa.LayerProto {
+      optional EmbeddingProto embedding_conf = 101;
+      optional LossProto loss_conf = 102;
+      optional DataProto data_conf = 103;
+    }
+
+In the subsequent sections, we describe the implementation of each layer,
+including its configuration message.
+
+#### RNNLayer
+
+This is the base layer of all other layers for this applications. It is defined
+as follows,
+
+    class RNNLayer : virtual public Layer {
+    public:
+      inline int window() { return window_; }
+    protected:
+      int window_;
+    };
+
+For this application, two iterations may process different number of words.
+Because sentences have different lengths.
+The `DataLayer` decides the effective window size. All other layers call its 
source layers to get the
+effective window size and resets `window_` in `ComputeFeature` function.
+
+#### DataLayer
+
+DataLayer is for loading Records.
+
+    class DataLayer : public RNNLayer, singa::InputLayer {
+     public:
+      void Setup(const LayerProto& proto, const vector<Layer*>& srclayers) 
override;
+      void ComputeFeature(int flag, const vector<Layer*>& srclayers) override;
+      int max_window() const {
+        return max_window_;
+      }
+     private:
+      int max_window_;
+      singa::io::Store* store_;
+    };
+
+The Setup function gets the user configured max window size.
+
+    max_window_ = proto.GetExtension(input_conf).max_window();
+
+The `ComputeFeature` function loads at most max_window records. It could also
+stop when the sentence ending character is encountered.
+
+    ...// shift the last record to the first
+    window_ = max_window_;
+    for (int i = 1; i <= max_window_; i++) {
+      // load record; break if it is the ending character
+    }
+
+The configuration of `DataLayer` is like
+
+    name: "data"
+    user_type: "kData"
+    [data_conf] {
+      path: "examples/rnnlm/train_data.bin"
+      max_window: 10
+    }
+
+#### EmbeddingLayer
+
+This layer gets records from `DataLayer`. For each record, the word index is
+parsed and used to get the corresponding word feature vector from the embedding
+matrix.
+
+The class is declared as follows,
+
+    class EmbeddingLayer : public RNNLayer {
+      ...
+      const std::vector<Param*> GetParams() const override {
+        std::vector<Param*> params{embed_};
+        return params;
+      }
+     private:
+      int word_dim_, vocab_size_;
+      Param* embed_;
+    }
+
+The `embed_` field is a matrix whose values are parameter to be learned.
+The matrix size is `vocab_size_` x `word_dim_`.
+
+The Setup function reads configurations for `word_dim_` and `vocab_size_`. Then
+it allocates feature Blob for `max_window` words and setups `embed_`.
+
+    int max_window = srclayers[0]->data(this).shape()[0];
+    word_dim_ = proto.GetExtension(embedding_conf).word_dim();
+    data_.Reshape(vector<int>{max_window, word_dim_});
+    ...
+    embed_->Setup(vector<int>{vocab_size_, word_dim_});
+
+The `ComputeFeature` function simply copies the feature vector from the 
`embed_`
+matrix into the feature Blob.
+
+    # reset effective window size
+    window_ = datalayer->window();
+    auto records = datalayer->records();
+    ...
+    for (int t = 0; t < window_; t++) {
+      int idx  <- word index
+      Copy(words[t], embed[idx]);
+    }
+
+The `ComputeGradient` function copies back the gradients to the `embed_` 
matrix.
+
+The configuration for `EmbeddingLayer` is like,
+
+    user_type: "kEmbedding"
+    [embedding_conf] {
+      word_dim: 15
+      vocab_size: 3720
+    }
+    srclayers: "data"
+    param {
+      name: "w1"
+      init {
+        type: kUniform
+        low:-0.3
+        high:0.3
+      }
+    }
+
+#### HiddenLayer
+
+This layer unrolls the recurrent connections for at most max_window times.
+The feature for position k is computed based on the feature from the embedding 
layer (position k)
+and the feature at position k-1 of this layer. The formula is
+
+`$$f[k]=\sigma (f[t-1]*W+src[t])$$`
+
+where `$W$` is a matrix with `word_dim_` x `word_dim_` parameters.
+
+If you want to implement a recurrent neural network following our
+design, this layer is of vital importance for you to refer to.
+
+    class HiddenLayer : public RNNLayer {
+      ...
+      const std::vector<Param*> GetParams() const override {
+        std::vector<Param*> params{weight_};
+        return params;
+      }
+    private:
+      Param* weight_;
+    };
+
+The `Setup` function setups the weight matrix as
+
+    weight_->Setup(std::vector<int>{word_dim, word_dim});
+
+The `ComputeFeature` function gets the effective window size (`window_`) from 
its source layer
+i.e., the embedding layer. Then it propagates the feature from position 0 to 
position
+`window_` -1. The detailed descriptions for this process are illustrated as 
follows.
+
+    void HiddenLayer::ComputeFeature() {
+      for(int t = 0; t < window_size; t++){
+        if(t == 0)
+          Copy(data[t], src[t]);
+        else
+          data[t]=sigmoid(data[t-1]*W + src[t]);
+      }
+    }
+
+The `ComputeGradient` function computes the gradient of the loss w.r.t. W and 
the source layer.
+Particularly, for each position k, since data[k] contributes to data[k+1] and 
the feature
+at position k in its destination layer (the loss layer), grad[k] should 
contains the gradient
+from two parts. The destination layer has already computed the gradient from 
the loss layer into
+grad[k]; In the `ComputeGradient` function, we need to add the gradient from 
position k+1.
+
+    void HiddenLayer::ComputeGradient(){
+      ...
+      for (int k = window_ - 1; k >= 0; k--) {
+        if (k < window_ - 1) {
+          grad[k] += dot(grad[k + 1], weight.T()); // add gradient from 
position t+1.
+        }
+        grad[k] =... // compute gL/gy[t], y[t]=data[t-1]*W+src[t]
+      }
+      gweight = dot(data.Slice(0, window_-1).T(), grad.Slice(1, window_));
+      Copy(gsrc, grad);
+    }
+
+After the loop, we get the gradient of the loss w.r.t y[k], which is used to
+compute the gradient of W and the src[k].
+
+#### LossLayer
+
+This layer computes the cross-entropy loss and the `$log_{10}P(w_{t+1}|w_t)$` 
(which
+could be averaged over all words by users to get the PPL value).
+
+There are two configuration fields to be specified by users.
+
+    message LossProto {
+      optional int32 nclass = 1;
+      optional int32 vocab_size = 2;
+    }
+
+There are two weight matrices to be learned
+
+    class LossLayer : public RNNLayer {
+      ...
+     private:
+      Param* word_weight_, *class_weight_;
+    }
+
+The ComputeFeature function computes the two probabilities respectively.
+
+`$$P(C_{w_{t+1}}|w_t) = Softmax(w_t * class\_weight_)$$`
+`$$P(w_{t+1}|C_{w_{t+1}}) = Softmax(w_t * 
word\_weight[class\_start:class\_end])$$`
+
+`$w_t$` is the feature from the hidden layer for the k-th word, its ground 
truth
+next word is `$w_{t+1}$`.  The first equation computes the probability 
distribution over all
+classes for the next word. The second equation computes the
+probability distribution over the words in the ground truth class for the next 
word.
+
+The ComputeGradient function computes the gradient of the source layer
+(i.e., the hidden layer) and the two weight matrices.
+
+### Updater Configuration
+
+We employ kFixedStep type of the learning rate change method and the
+configuration is as follows. We decay the learning rate once the performance 
does
+not increase on the validation dataset.
+
+    updater{
+      type: kSGD
+      learning_rate {
+        type: kFixedStep
+        fixedstep_conf:{
+          step:0
+          step:48810
+          step:56945
+          step:65080
+          step:73215
+          step_lr:0.1
+          step_lr:0.05
+          step_lr:0.025
+          step_lr:0.0125
+          step_lr:0.00625
+        }
+      }
+    }
+
+### TrainOneBatch() Function
+
+We use BP (BackPropagation) algorithm to train the RNN model here. The
+corresponding configuration can be seen below.
+
+    # In job.conf file
+    train_one_batch {
+      alg: kBackPropagation
+    }
+
+### Cluster Configuration
+
+The default cluster configuration can be used, i.e., single worker and single 
server
+in a single process.

Added: incubator/singa/site/trunk/content/markdown/v0.3.0/zh/train-one-batch.md
URL: 
http://svn.apache.org/viewvc/incubator/singa/site/trunk/content/markdown/v0.3.0/zh/train-one-batch.md?rev=1740048&view=auto
==============================================================================
--- incubator/singa/site/trunk/content/markdown/v0.3.0/zh/train-one-batch.md 
(added)
+++ incubator/singa/site/trunk/content/markdown/v0.3.0/zh/train-one-batch.md 
Wed Apr 20 05:09:06 2016
@@ -0,0 +1,179 @@
+# Train-One-Batch
+
+---
+
+For each SGD iteration, every worker calls the `TrainOneBatch` function to
+compute gradients of parameters associated with local layers (i.e., layers
+dispatched to it).  SINGA has implemented two algorithms for the
+`TrainOneBatch` function. Users select the corresponding algorithm for
+their model in the configuration.
+
+## Basic user guide
+
+### Back-propagation
+
+[BP algorithm](http://yann.lecun.com/exdb/publis/pdf/lecun-98b.pdf) is used for
+computing gradients of feed-forward models, e.g., [CNN](cnn.html)
+and [MLP](mlp.html), and [RNN](rnn.html) models in SINGA.
+
+
+    # in job.conf
+    alg: kBP
+
+To use the BP algorithm for the `TrainOneBatch` function, users just simply
+configure the `alg` field with `kBP`. If a neural net contains user-defined
+layers, these layers must be implemented properly be to consistent with the
+implementation of the BP algorithm in SINGA (see below).
+
+
+### Contrastive Divergence
+
+[CD algorithm](http://www.cs.toronto.edu/~fritz/absps/nccd.pdf) is used for
+computing gradients of energy models like RBM.
+
+    # job.conf
+    alg: kCD
+    cd_conf {
+      cd_k: 2
+    }
+
+To use the CD algorithm for the `TrainOneBatch` function, users just configure
+the `alg` field to `kCD`. Uses can also configure the Gibbs sampling steps in
+the CD algorthm through the `cd_k` field. By default, it is set to 1.
+
+
+
+## Advanced user guide
+
+### Implementation of BP
+
+The BP algorithm is implemented in SINGA following the below pseudo code,
+
+    BPTrainOnebatch(step, net) {
+      // forward propagate
+      foreach layer in net.local_layers() {
+        if IsBridgeDstLayer(layer)
+          recv data from the src layer (i.e., BridgeSrcLayer)
+        foreach param in layer.params()
+          Collect(param) // recv response from servers for last update
+
+        layer.ComputeFeature(kForward)
+
+        if IsBridgeSrcLayer(layer)
+          send layer.data_ to dst layer
+      }
+      // backward propagate
+      foreach layer in reverse(net.local_layers) {
+        if IsBridgeSrcLayer(layer)
+          recv gradient from the dst layer (i.e., BridgeDstLayer)
+          recv response from servers for last update
+
+        layer.ComputeGradient()
+        foreach param in layer.params()
+          Update(step, param) // send param.grad_ to servers
+
+        if IsBridgeDstLayer(layer)
+          send layer.grad_ to src layer
+      }
+    }
+
+
+It forwards features through all local layers (can be checked by layer
+partition ID and worker ID) and backwards gradients in the reverse order.
+[BridgeSrcLayer](layer.html#bridgesrclayer--bridgedstlayer)
+(resp. `BridgeDstLayer`) will be blocked until the feature (resp.
+gradient) from the source (resp. destination) layer comes. Parameter gradients
+are sent to servers via `Update` function. Updated parameters are collected via
+`Collect` function, which will be blocked until the parameter is updated.
+[Param](param.html) objects have versions, which can be used to
+check whether the `Param` objects have been updated or not.
+
+Since RNN models are unrolled into feed-forward models, users need to implement
+the forward propagation in the recurrent layer's `ComputeFeature` function,
+and implement the backward propagation in the recurrent layer's 
`ComputeGradient`
+function. As a result, the whole `TrainOneBatch` runs
+[back-propagation through time 
(BPTT)](https://en.wikipedia.org/wiki/Backpropagation_through_time)  algorithm.
+
+### Implementation of CD
+
+The CD algorithm is implemented in SINGA following the below pseudo code,
+
+    CDTrainOneBatch(step, net) {
+      # positive phase
+      foreach layer in net.local_layers()
+        if IsBridgeDstLayer(layer)
+          recv positive phase data from the src layer (i.e., BridgeSrcLayer)
+        foreach param in layer.params()
+          Collect(param)  // recv response from servers for last update
+        layer.ComputeFeature(kPositive)
+        if IsBridgeSrcLayer(layer)
+          send positive phase data to dst layer
+
+      # negative phase
+      foreach gibbs in [0...layer_proto_.cd_k]
+        foreach layer in net.local_layers()
+          if IsBridgeDstLayer(layer)
+            recv negative phase data from the src layer (i.e., BridgeSrcLayer)
+          layer.ComputeFeature(kPositive)
+          if IsBridgeSrcLayer(layer)
+            send negative phase data to dst layer
+
+      foreach layer in net.local_layers()
+        layer.ComputeGradient()
+        foreach param in layer.params
+          Update(param)
+    }
+
+Parameter gradients are computed after the positive phase and negative phase.
+
+### Implementing a new algorithm
+
+SINGA implements BP and CD by creating two subclasses of
+the [Worker](../api/classsinga_1_1Worker.html) class:
+[BPWorker](../api/classsinga_1_1BPWorker.html)'s `TrainOneBatch` function 
implements the BP
+algorithm; [CDWorker](../api/classsinga_1_1CDWorker.html)'s `TrainOneBatch` 
function implements the CD
+algorithm. To implement a new algorithm for the `TrainOneBatch` function, users
+need to create a new subclass of the `Worker`, e.g.,
+
+    class FooWorker : public Worker {
+      void TrainOneBatch(int step, shared_ptr<NeuralNet> net, Metric* perf) 
override;
+      void TestOneBatch(int step, Phase phase, shared_ptr<NeuralNet> net, 
Metric* perf) override;
+    };
+
+The `FooWorker` must implement the above two functions for training one
+mini-batch and testing one mini-batch. The `perf` argument is for collecting
+training or testing performance, e.g., the objective loss or accuracy. It is
+passed to the `ComputeFeature` function of each layer.
+
+Users can define some fields for users to configure
+
+    # in user.proto
+    message FooWorkerProto {
+      optional int32 b = 1;
+    }
+
+    extend JobProto {
+      optional FooWorkerProto foo_conf = 101;
+    }
+
+    # in job.proto
+    JobProto {
+      ...
+      extension 101..max;
+    }
+
+It is similar as [adding configuration fields for a new 
layer](layer.html#implementing-a-new-layer-subclass).
+
+To use `FooWorker`, users need to register it in the 
[main.cc](programming-guide.html)
+and configure the `alg` and `foo_conf` fields,
+
+    # in main.cc
+    const int kFoo = 3; // worker ID, must be different to that of CDWorker 
and BPWorker
+    driver.RegisterWorker<FooWorker>(kFoo);
+
+    # in job.conf
+    ...
+    alg: 3
+    [foo_conf] {
+      b = 4;
+    }

Added: incubator/singa/site/trunk/content/markdown/v0.3.0/zh/updater.md
URL: 
http://svn.apache.org/viewvc/incubator/singa/site/trunk/content/markdown/v0.3.0/zh/updater.md?rev=1740048&view=auto
==============================================================================
--- incubator/singa/site/trunk/content/markdown/v0.3.0/zh/updater.md (added)
+++ incubator/singa/site/trunk/content/markdown/v0.3.0/zh/updater.md Wed Apr 20 
05:09:06 2016
@@ -0,0 +1,284 @@
+# Updater
+
+---
+
+Every server in SINGA has an [Updater](../api/classsinga_1_1Updater.html)
+instance that updates parameters based on gradients.
+In this page, the *Basic user guide* describes the configuration of an updater.
+The *Advanced user guide* present details on how to implement a new updater 
and a new
+learning rate changing method.
+
+## Basic user guide
+
+There are many different parameter updating protocols (i.e., subclasses of
+`Updater`). They share some configuration fields like
+
+* `type`, an integer for identifying an updater;
+* `learning_rate`, configuration for the
+[LRGenerator](../api/classsinga_1_1LRGenerator.html) which controls the 
learning rate.
+* `weight_decay`, the co-efficient for [L2 * 
regularization](http://deeplearning.net/tutorial/gettingstarted.html#regularization).
+* 
[momentum](http://ufldl.stanford.edu/tutorial/supervised/OptimizationStochasticGradientDescent/).
+
+If you are not familiar with the above terms, you can get their meanings in
+[this page provided by 
Karpathy](http://cs231n.github.io/neural-networks-3/#update).
+
+### Configuration of built-in updater classes
+
+#### Updater
+The base `Updater` implements the [vanilla SGD 
algorithm](http://cs231n.github.io/neural-networks-3/#sgd).
+Its configuration type is `kSGD`.
+Users need to configure at least the `learning_rate` field.
+`momentum` and `weight_decay` are optional fields.
+
+    updater{
+      type: kSGD
+      momentum: float
+      weight_decay: float
+      learning_rate {
+        ...
+      }
+    }
+
+#### AdaGradUpdater
+
+It inherits the base `Updater` to implement the
+[AdaGrad](http://www.magicbroom.info/Papers/DuchiHaSi10.pdf) algorithm.
+Its type is `kAdaGrad`.
+`AdaGradUpdater` is configured similar to `Updater` except
+that `momentum` is not used.
+
+#### NesterovUpdater
+
+It inherits the base `Updater` to implements the
+[Nesterov](http://arxiv.org/pdf/1212.0901v2.pdf) (section 3.5) updating 
protocol.
+Its type is `kNesterov`.
+`learning_rate` and `momentum` must be configured. `weight_decay` is an
+optional configuration field.
+
+#### RMSPropUpdater
+
+It inherits the base `Updater` to implements the
+[RMSProp algorithm](http://cs231n.github.io/neural-networks-3/#sgd) proposed by
+[Hinton](http://www.cs.toronto.edu/%7Etijmen/csc321/slides/lecture_slides_lec6.pdf)(slide
 29).
+Its type is `kRMSProp`.
+
+    updater {
+      type: kRMSProp
+      rmsprop_conf {
+       rho: float # [0,1]
+      }
+    }
+
+
+### Configuration of learning rate
+
+The `learning_rate` field is configured as,
+
+    learning_rate {
+      type: ChangeMethod
+      base_lr: float  # base/initial learning rate
+      ... # fields to a specific changing method
+    }
+
+The common fields include `type` and `base_lr`. SINGA provides the following
+`ChangeMethod`s.
+
+#### kFixed
+
+The `base_lr` is used for all steps.
+
+#### kLinear
+
+The updater should be configured like
+
+    learning_rate {
+      base_lr:  float
+      linear_conf {
+        freq: int
+        final_lr: float
+      }
+    }
+
+Linear interpolation is used to change the learning rate,
+
+    lr = (1 - step / freq) * base_lr + (step / freq) * final_lr
+
+#### kExponential
+
+The udapter should be configured like
+
+    learning_rate {
+      base_lr: float
+      exponential_conf {
+        freq: int
+      }
+    }
+
+The learning rate for `step` is
+
+    lr = base_lr / 2^(step / freq)
+
+#### kInverseT
+
+The updater should be configured like
+
+    learning_rate {
+      base_lr: float
+      inverset_conf {
+        final_lr: float
+      }
+    }
+
+The learning rate for `step` is
+
+    lr = base_lr / (1 + step / final_lr)
+
+#### kInverse
+
+The updater should be configured like
+
+    learning_rate {
+      base_lr: float
+      inverse_conf {
+        gamma: float
+        pow: float
+      }
+    }
+
+
+The learning rate for `step` is
+
+    lr = base_lr * (1 + gamma * setp)^(-pow)
+
+
+#### kStep
+
+The updater should be configured like
+
+    learning_rate {
+      base_lr : float
+      step_conf {
+        change_freq: int
+        gamma: float
+      }
+    }
+
+
+The learning rate for `step` is
+
+    lr = base_lr * gamma^ (step / change_freq)
+
+#### kFixedStep
+
+The updater should be configured like
+
+    learning_rate {
+      fixedstep_conf {
+        step: int
+        step_lr: float
+
+        step: int
+        step_lr: float
+
+        ...
+      }
+    }
+
+Denote the i-th tuple as (step[i], step_lr[i]), then the learning rate for
+`step` is,
+
+    step_lr[k]
+
+where step[k] is the smallest number that is larger than `step`.
+
+
+## Advanced user guide
+
+### Implementing a new Updater subclass
+
+The base Updater class has one virtual function,
+
+    class Updater{
+     public:
+      virtual void Update(int step, Param* param, float grad_scale = 1.0f) = 0;
+
+     protected:
+      UpdaterProto proto_;
+      LRGenerator lr_gen_;
+    };
+
+It updates the values of the `param` based on its gradients. The `step`
+argument is for deciding the learning rate which may change through time
+(step). `grad_scale` scales the original gradient values. This function is
+called by servers once it receives all gradients for the same `Param` object.
+
+To implement a new Updater subclass, users must override the `Update` function.
+
+    class FooUpdater : public Updater {
+      void Update(int step, Param* param, float grad_scale = 1.0f) override;
+    };
+
+Configuration of this new updater can be declared similar to that of a new
+layer,
+
+    # in user.proto
+    FooUpdaterProto {
+      optional int32 c = 1;
+    }
+
+    extend UpdaterProto {
+      optional FooUpdaterProto fooupdater_conf= 101;
+    }
+
+The new updater should be registered in the
+[main function](programming-guide.html)
+
+    driver.RegisterUpdater<FooUpdater>("FooUpdater");
+
+Users can then configure the job as
+
+    # in job.conf
+    updater {
+      user_type: "FooUpdater"  # must use user_type with the same string 
identifier as the one used for registration
+      fooupdater_conf {
+        c : 20;
+      }
+    }
+
+### Implementing a new LRGenerator subclass
+
+The base `LRGenerator` is declared as,
+
+    virtual float Get(int step);
+
+To implement a subclass, e.g., `FooLRGen`, users should declare it like
+
+    class FooLRGen : public LRGenerator {
+     public:
+      float Get(int step) override;
+    };
+
+Configuration of `FooLRGen` can be defined using a protocol message,
+
+    # in user.proto
+    message FooLRProto {
+     ...
+    }
+
+    extend LRGenProto {
+      optional FooLRProto foolr_conf = 101;
+    }
+
+The configuration is then like,
+
+    learning_rate {
+      user_type : "FooLR" # must use user_type with the same string identifier 
as the one used for registration
+      base_lr: float
+      foolr_conf {
+        ...
+      }
+    }
+
+Users have to register this subclass in the main function,
+
+      driver.RegisterLRGenerator<FooLRGen, std::string>("FooLR")

svn commit: r1740048 [10/10] - in /incubator/singa/site/trunk/content/markdown: ./ develop/ docs/ docs/kr/ v0.3.0/ v0.3.0/jp/ v0.3.0/kr/ v0.3.0/zh/

Reply via email to