svn commit: r1740048 [9/10] - in /incubator/singa/site/trunk/content/markdown: ./ develop/ docs/ docs/kr/ v0.3.0/ v0.3.0/jp/ v0.3.0/kr/ v0.3.0/zh/

wangwei Tue, 19 Apr 2016 22:09:29 -0700

Added: incubator/singa/site/trunk/content/markdown/v0.3.0/rbm.md
URL: 
http://svn.apache.org/viewvc/incubator/singa/site/trunk/content/markdown/v0.3.0/rbm.md?rev=1740048&view=auto
==============================================================================
--- incubator/singa/site/trunk/content/markdown/v0.3.0/rbm.md (added)
+++ incubator/singa/site/trunk/content/markdown/v0.3.0/rbm.md Wed Apr 20 
05:09:06 2016
@@ -0,0 +1,365 @@
+# RBM Example
+
+---
+
+This example uses SINGA to train 4 RBM models and one auto-encoder model over 
the
+[MNIST dataset](http://yann.lecun.com/exdb/mnist/). The auto-encoder model is 
trained
+to reduce the dimensionality of the MNIST image feature. The RBM models are 
trained
+to initialize parameters of the auto-encoder model. This example application is
+from [Hinton's science paper](http://www.cs.toronto.edu/~hinton/science.pdf).
+
+## Running instructions
+
+Running scripts are provided in *SINGA_ROOT/examples/rbm* folder.
+
+The MNIST dataset has 70,000 handwritten digit images. The
+[data preparation](data.html) page
+has details on converting this dataset into SINGA recognizable format. Users 
can
+simply run the following commands to download and convert the dataset.
+
+    # at SINGA_ROOT/examples/mnist/
+    $ cp Makefile.example Makefile
+    $ make download
+    $ make create
+
+The training is separated into two phases, namely pre-training and fine-tuning.
+The pre-training phase trains 4 RBMs in sequence,
+
+    # at SINGA_ROOT/
+    $ ./bin/singa-run.sh -conf examples/rbm/rbm1.conf
+    $ ./bin/singa-run.sh -conf examples/rbm/rbm2.conf
+    $ ./bin/singa-run.sh -conf examples/rbm/rbm3.conf
+    $ ./bin/singa-run.sh -conf examples/rbm/rbm4.conf
+
+The fine-tuning phase trains the auto-encoder by,
+
+    $ ./bin/singa-run.sh -conf examples/rbm/autoencoder.conf
+
+
+## Training details
+
+### RBM1
+
+<img src="../images/example-rbm1.png" align="center" width="200px"/>
+<span><strong>Figure 1 - RBM1.</strong></span>
+
+The neural net structure for training RBM1 is shown in Figure 1.
+The data layer and parser layer provides features for training RBM1.
+The visible layer (connected with parser layer) of RBM1 accepts the image 
feature
+(784 dimension). The hidden layer is set to have 1000 neurons (units).
+These two layers are configured as,
+
+    layer{
+      name: "RBMVis"
+      type: kRBMVis
+      srclayers:"mnist"
+      srclayers:"RBMHid"
+      rbm_conf{
+        hdim: 1000
+      }
+      param{
+        name: "w1"
+        init{
+          type: kGaussian
+          mean: 0.0
+          std: 0.1
+        }
+      }
+      param{
+        name: "b11"
+        init{
+          type: kConstant
+          value: 0.0
+        }
+      }
+    }
+
+    layer{
+      name: "RBMHid"
+      type: kRBMHid
+      srclayers:"RBMVis"
+      rbm_conf{
+        hdim: 1000
+      }
+      param{
+        name: "w1_"
+        share_from: "w1"
+      }
+      param{
+        name: "b12"
+        init{
+          type: kConstant
+          value: 0.0
+        }
+      }
+    }
+
+
+
+For RBM, the weight matrix is shared by the visible and hidden layers. For 
instance,
+`w1` is shared by `vis` and `hid` layers shown in Figure 1. In SINGA, we can 
configure
+the `share_from` field to enable [parameter sharing](param.html)
+as shown above for the param `w1` and `w1_`.
+
+[Contrastive Divergence](train-one-batch.html#contrastive-divergence)
+is configured as the algorithm for [TrainOneBatch](train-one-batch.html).
+Following Hinton's paper, we configure the [updating protocol](updater.html)
+as follows,
+
+    # Updater Configuration
+    updater{
+      type: kSGD
+      momentum: 0.2
+      weight_decay: 0.0002
+      learning_rate{
+        base_lr: 0.1
+        type: kFixed
+      }
+    }
+
+Since the parameters of RBM0 will be used to initialize the auto-encoder, we 
should
+configure the `workspace` field to specify a path for the checkpoint folder.
+For example, if we configure it as,
+
+    cluster {
+      workspace: "examples/rbm/rbm1/"
+    }
+
+Then SINGA will [checkpoint the parameters](checkpoint.html) into 
*examples/rbm/rbm1/*.
+
+### RBM1
+<img src="../images/example-rbm2.png" align="center" width="200px"/>
+<span><strong>Figure 2 - RBM2.</strong></span>
+
+Figure 2 shows the net structure of training RBM2.
+The visible units of RBM2 accept the output from the Sigmoid1 layer. The 
Inner1 layer
+is a  `InnerProductLayer` whose parameters are set to the `w1` and `b12` 
learned
+from RBM1.
+The neural net configuration is (with layers for data layer and parser layer 
omitted).
+
+    layer{
+      name: "Inner1"
+      type: kInnerProduct
+      srclayers:"mnist"
+      innerproduct_conf{
+        num_output: 1000
+      }
+      param{ name: "w1" }
+      param{ name: "b12"}
+    }
+
+    layer{
+      name: "Sigmoid1"
+      type: kSigmoid
+      srclayers:"Inner1"
+    }
+
+    layer{
+      name: "RBMVis"
+      type: kRBMVis
+      srclayers:"Sigmoid1"
+      srclayers:"RBMHid"
+      rbm_conf{
+        hdim: 500
+      }
+      param{
+        name: "w2"
+        ...
+      }
+      param{
+        name: "b21"
+        ...
+      }
+    }
+
+    layer{
+      name: "RBMHid"
+      type: kRBMHid
+      srclayers:"RBMVis"
+      rbm_conf{
+        hdim: 500
+      }
+      param{
+        name: "w2_"
+        share_from: "w2"
+      }
+      param{
+        name: "b22"
+        ...
+      }
+    }
+
+To load w0 and b02 from RBM0's checkpoint file, we configure the 
`checkpoint_path` as,
+
+    checkpoint_path: "examples/rbm/rbm1/checkpoint/step6000-worker0"
+    cluster{
+      workspace: "examples/rbm/rbm2"
+    }
+
+The workspace is changed for checkpointing `w2`, `b21` and `b22` into
+*examples/rbm/rbm2/*.
+
+### RBM3
+
+<img src="../images/example-rbm3.png" align="center" width="200px"/>
+<span><strong>Figure 3 - RBM3.</strong></span>
+
+Figure 3 shows the net structure of training RBM3. In this model, a layer with
+250 units is added as the hidden layer of RBM3. The visible units of RBM3
+accepts output from Sigmoid2 layer. Parameters of Inner1 and Innner2 are set to
+`w1,b12,w2,b22` which can be load from the checkpoint file of RBM2,
+i.e., "examples/rbm/rbm2/".
+
+### RBM4
+
+
+<img src="../images/example-rbm4.png" align="center" width="200px"/>
+<span><strong>Figure 4 - RBM4.</strong></span>
+
+Figure 4 shows the net structure of training RBM4. It is similar to Figure 3,
+but according to [Hinton's science 
paper](http://www.cs.toronto.edu/~hinton/science.pdf), the hidden units of the
+top RBM (RBM4) have stochastic real-valued states drawn from a unit variance
+Gaussian whose mean is determined by the input from the RBM's logistic visible
+units. So we add a `gaussian` field in the RBMHid layer to control the
+sampling distribution (Gaussian or Bernoulli). In addition, this
+RBM has a much smaller learning rate (0.001).  The neural net configuration for
+the RBM4 and the updating protocol is (with layers for data layer and parser
+layer omitted),
+
+    # Updater Configuration
+    updater{
+      type: kSGD
+      momentum: 0.9
+      weight_decay: 0.0002
+      learning_rate{
+        base_lr: 0.001
+        type: kFixed
+      }
+    }
+
+    layer{
+      name: "RBMVis"
+      type: kRBMVis
+      srclayers:"Sigmoid3"
+      srclayers:"RBMHid"
+      rbm_conf{
+        hdim: 30
+      }
+      param{
+        name: "w4"
+        ...
+      }
+      param{
+        name: "b41"
+        ...
+      }
+    }
+
+    layer{
+      name: "RBMHid"
+      type: kRBMHid
+      srclayers:"RBMVis"
+      rbm_conf{
+        hdim: 30
+        gaussian: true
+      }
+      param{
+        name: "w4_"
+        share_from: "w4"
+      }
+      param{
+        name: "b42"
+        ...
+      }
+    }
+
+### Auto-encoder
+In the fine-tuning stage, the 4 RBMs are "unfolded" to form encoder and decoder
+networks that are initialized using the parameters from the previous 4 RBMs.
+
+<img src="../images/example-autoencoder.png" align="center" width="500px"/>
+<span><strong>Figure 5 - Auto-Encoders.</strong></span>
+
+
+Figure 5 shows the neural net structure for training the auto-encoder.
+[Back propagation (kBP)] (train-one-batch.html) is
+configured as the algorithm for `TrainOneBatch`. We use the same cluster
+configuration as RBM models. For updater, we use 
[AdaGrad](updater.html#adagradupdater) algorithm with
+fixed learning rate.
+
+    ### Updater Configuration
+    updater{
+      type: kAdaGrad
+      learning_rate{
+      base_lr: 0.01
+      type: kFixed
+      }
+    }
+
+
+
+According to [Hinton's science 
paper](http://www.cs.toronto.edu/~hinton/science.pdf),
+we configure a EuclideanLoss layer to compute the reconstruction error. The 
neural net
+configuration is (with some of the middle layers omitted),
+
+    layer{ name: "data" }
+    layer{ name:"mnist" }
+    layer{
+      name: "Inner1"
+      param{ name: "w1" }
+      param{ name: "b12" }
+    }
+    layer{ name: "Sigmoid1" }
+    ...
+    layer{
+      name: "Inner8"
+      innerproduct_conf{
+        num_output: 784
+        transpose: true
+      }
+      param{
+        name: "w8"
+        share_from: "w1"
+      }
+      param{ name: "b11" }
+    }
+    layer{ name: "Sigmoid8" }
+
+    # Euclidean Loss Layer Configuration
+    layer{
+      name: "loss"
+      type:kEuclideanLoss
+      srclayers:"Sigmoid8"
+      srclayers:"mnist"
+    }
+
+To load pre-trained parameters from the 4 RBMs' checkpoint file we configure 
`checkpoint_path` as
+
+    ### Checkpoint Configuration
+    checkpoint_path: "examples/rbm/checkpoint/rbm1/checkpoint/step6000-worker0"
+    checkpoint_path: "examples/rbm/checkpoint/rbm2/checkpoint/step6000-worker0"
+    checkpoint_path: "examples/rbm/checkpoint/rbm3/checkpoint/step6000-worker0"
+    checkpoint_path: "examples/rbm/checkpoint/rbm4/checkpoint/step6000-worker0"
+
+
+## Visualization Results
+
+<div>
+<img src="../images/rbm-weight.PNG" align="center" width="300px"/>
+
+<img src="../images/rbm-feature.PNG" align="center" width="300px"/>
+<br/>
+<span><strong>Figure 6 - Bottom RBM weight matrix.</strong></span>
+&nbsp;
+&nbsp;
+&nbsp;
+&nbsp;
+
+<span><strong>Figure 7 - Top layer features.</strong></span>
+</div>
+
+Figure 6 visualizes sample columns of the weight matrix of RBM1, We can see the
+Gabor-like filters are learned. Figure 7 depicts the features extracted from
+the top-layer of the auto-encoder, wherein one point represents one image.
+Different colors represent different digits. We can see that most images are
+well clustered according to the ground truth.


Added: incubator/singa/site/trunk/content/markdown/v0.3.0/rnn.md
URL: 
http://svn.apache.org/viewvc/incubator/singa/site/trunk/content/markdown/v0.3.0/rnn.md?rev=1740048&view=auto
==============================================================================
--- incubator/singa/site/trunk/content/markdown/v0.3.0/rnn.md (added)
+++ incubator/singa/site/trunk/content/markdown/v0.3.0/rnn.md Wed Apr 20 
05:09:06 2016
@@ -0,0 +1,420 @@
+# Recurrent Neural Networks for Language Modelling
+
+---
+
+Recurrent Neural Networks (RNN) are widely used for modelling sequential data,
+such as music and sentences.  In this example, we use SINGA to train a
+[RNN 
model](http://www.fit.vutbr.cz/research/groups/speech/publi/2010/mikolov_interspeech2010_IS100722.pdf)
+proposed by Tomas Mikolov for [language 
modeling](https://en.wikipedia.org/wiki/Language_model).
+The training objective (loss) is
+to minimize the [perplexity per 
word](https://en.wikipedia.org/wiki/Perplexity), which
+is equivalent to maximize the probability of predicting the next word given 
the current word in
+a sentence.
+
+Different to the [CNN](cnn.html), [MLP](mlp.html)
+and [RBM](rbm.html) examples which use built-in
+layers(layer) and records(data),
+none of the layers in this example are built-in. Hence users would learn to
+implement their own layers and data records through this example.
+
+## Running instructions
+
+In *SINGA_ROOT/examples/rnnlm/*, scripts are provided to run the training job.
+First, the data is prepared by
+
+    $ cp Makefile.example Makefile
+    $ make download
+    $ make create
+
+Second, to compile the source code under *examples/rnnlm/*, run
+
+    $ make rnnlm
+
+An executable file *rnnlm.bin* will be generated.
+
+Third, the training is started by passing *rnnlm.bin* and the job configuration
+to *singa-run.sh*,
+
+    # at SINGA_ROOT/
+    # export LD_LIBRARY_PATH=.libs:$LD_LIBRARY_PATH
+    $ ./bin/singa-run.sh -exec examples/rnnlm/rnnlm.bin -conf 
examples/rnnlm/job.conf
+
+## Implementations
+
+<img src="../images/rnnlm.png" align="center" width="400px"/>
+<span><strong>Figure 1 - Net structure of the RNN model.</strong></span>
+
+The neural net structure is shown Figure 1.  Word records are loaded by
+`DataLayer`. For every iteration, at most `max_window` word records are
+processed. If a sentence ending character is read, the `DataLayer` stops
+loading immediately. `EmbeddingLayer` looks up a word embedding matrix to 
extract
+feature vectors for words loaded by the `DataLayer`.  These features are 
transformed by the
+`HiddenLayer` which propagates the features from left to right. The
+output feature for word at position k is influenced by words from position 0 to
+k-1.  Finally, `LossLayer` computes the cross-entropy loss (see below)
+by predicting the next word of each word.
+The cross-entropy loss is computed as
+
+`$$L(w_t)=-log P(w_{t+1}|w_t)$$`
+
+Given `$w_t$` the above equation would compute over all words in the 
vocabulary,
+which is time consuming.
+[RNNLM 
Toolkit](https://f25ea9ccb7d3346ce6891573d543960492b92c30.googledrive.com/host/0ByxdPXuxLPS5RFM5dVNvWVhTd0U/rnnlm-0.4b.tgz)
+accelerates the computation as
+
+`$$P(w_{t+1}|w_t) = P(C_{w_{t+1}}|w_t) * P(w_{t+1}|C_{w_{t+1}})$$`
+
+Words from the vocabulary are partitioned into a user-defined number of 
classes.
+The first term on the left side predicts the class of the next word, and
+then predicts the next word given its class. Both the number of classes and
+the words from one class are much smaller than the vocabulary size. The 
probabilities
+can be calculated much faster.
+
+The perplexity per word is computed by,
+
+`$$PPL = 10^{- avg_t log_{10} P(w_{t+1}|w_t)}$$`
+
+### Data preparation
+
+We use a small dataset provided by the [RNNLM 
Toolkit](https://f25ea9ccb7d3346ce6891573d543960492b92c30.googledrive.com/host/0ByxdPXuxLPS5RFM5dVNvWVhTd0U/rnnlm-0.4b.tgz).
+It has 10,000 training sentences, with 71350 words in total and 3720 unique 
words.
+The subsequent steps follow the instructions in
+[Data Preparation](data.html) to convert the
+raw data into records and insert them into data stores.
+
+#### Download source data
+
+    # in SINGA_ROOT/examples/rnnlm/
+    cp Makefile.example Makefile
+    make download
+
+#### Define record format
+
+We define the word record as follows,
+
+    # in SINGA_ROOT/examples/rnnlm/rnnlm.proto
+    message WordRecord {
+      optional string word = 1;
+      optional int32 word_index = 2;
+      optional int32 class_index = 3;
+      optional int32 class_start = 4;
+      optional int32 class_end = 5;
+    }
+
+It includes the word string and its index in the vocabulary.
+Words in the vocabulary are sorted based on their frequency in the training 
dataset.
+The sorted list is cut into 100 sublists such that each sublist has 1/100 total
+word frequency. Each sublist is called a class.
+Hence each word has a `class_index` ([0,100)). The `class_start` is the index
+of the first word in the same class as `word`. The `class_end` is the index of
+the first word in the next class.
+
+#### Create data stores
+
+We use code from RNNLM Toolkit to read words, and sort them into classes.
+The main function in *create_store.cc* first creates word classes based on the 
training
+dataset. Second it calls the following function to create data store for the
+training, validation and test dataset.
+
+    int create_data(const char *input_file, const char *output_file);
+
+`input` is the path to training/validation/testing text file from the RNNLM 
Toolkit, `output` is output store file.
+This function starts with
+
+    singa::io::KVFile store;
+    store.Open(output, signa::io::kCreate);
+
+Then it reads the words one by one. For each word it creates a `WordRecord` 
instance,
+and inserts it into the store,
+
+    int wcnt = 0; // word count
+    WordRecord  wordRecord;
+    while(1) {
+      readWord(wordstr, fin);
+      if (feof(fin)) break;
+      ...// fill in the wordRecord;
+      string val;
+      wordRecord.SerializeToString(&val);
+      int length = snprintf(key, BUFFER_LEN, "%05d", wcnt++);
+      store.Write(string(key, length), val);
+    }
+
+Compilation and running commands are provided in the *Makefile.example*.
+After executing
+
+    make create
+
+*train_data.bin*, *test_data.bin* and *valid_data.bin* will be created.
+
+
+### Layer implementation
+
+4 user-defined layers are implemented for this application.
+Following the guide for implementing [new Layer 
subclasses](layer#implementing-a-new-layer-subclass),
+we extend the [LayerProto](../api/classsinga_1_1LayerProto.html)
+to include the configuration messages of user-defined layers as shown below
+(3 out of the 7 layers have specific configurations),
+
+
+    import "job.proto";     // Layer message for SINGA is defined
+
+    //For implementation of RNNLM application
+    extend singa.LayerProto {
+      optional EmbeddingProto embedding_conf = 101;
+      optional LossProto loss_conf = 102;
+      optional DataProto data_conf = 103;
+    }
+
+In the subsequent sections, we describe the implementation of each layer,
+including its configuration message.
+
+#### RNNLayer
+
+This is the base layer of all other layers for this applications. It is defined
+as follows,
+
+    class RNNLayer : virtual public Layer {
+    public:
+      inline int window() { return window_; }
+    protected:
+      int window_;
+    };
+
+For this application, two iterations may process different number of words.
+Because sentences have different lengths.
+The `DataLayer` decides the effective window size. All other layers call its 
source layers to get the
+effective window size and resets `window_` in `ComputeFeature` function.
+
+#### DataLayer
+
+DataLayer is for loading Records.
+
+    class DataLayer : public RNNLayer, singa::InputLayer {
+     public:
+      void Setup(const LayerProto& proto, const vector<Layer*>& srclayers) 
override;
+      void ComputeFeature(int flag, const vector<Layer*>& srclayers) override;
+      int max_window() const {
+        return max_window_;
+      }
+     private:
+      int max_window_;
+      singa::io::Store* store_;
+    };
+
+The Setup function gets the user configured max window size.
+
+    max_window_ = proto.GetExtension(input_conf).max_window();
+
+The `ComputeFeature` function loads at most max_window records. It could also
+stop when the sentence ending character is encountered.
+
+    ...// shift the last record to the first
+    window_ = max_window_;
+    for (int i = 1; i <= max_window_; i++) {
+      // load record; break if it is the ending character
+    }
+
+The configuration of `DataLayer` is like
+
+    name: "data"
+    user_type: "kData"
+    [data_conf] {
+      path: "examples/rnnlm/train_data.bin"
+      max_window: 10
+    }
+
+#### EmbeddingLayer
+
+This layer gets records from `DataLayer`. For each record, the word index is
+parsed and used to get the corresponding word feature vector from the embedding
+matrix.
+
+The class is declared as follows,
+
+    class EmbeddingLayer : public RNNLayer {
+      ...
+      const std::vector<Param*> GetParams() const override {
+        std::vector<Param*> params{embed_};
+        return params;
+      }
+     private:
+      int word_dim_, vocab_size_;
+      Param* embed_;
+    }
+
+The `embed_` field is a matrix whose values are parameter to be learned.
+The matrix size is `vocab_size_` x `word_dim_`.
+
+The Setup function reads configurations for `word_dim_` and `vocab_size_`. Then
+it allocates feature Blob for `max_window` words and setups `embed_`.
+
+    int max_window = srclayers[0]->data(this).shape()[0];
+    word_dim_ = proto.GetExtension(embedding_conf).word_dim();
+    data_.Reshape(vector<int>{max_window, word_dim_});
+    ...
+    embed_->Setup(vector<int>{vocab_size_, word_dim_});
+
+The `ComputeFeature` function simply copies the feature vector from the 
`embed_`
+matrix into the feature Blob.
+
+    # reset effective window size
+    window_ = datalayer->window();
+    auto records = datalayer->records();
+    ...
+    for (int t = 0; t < window_; t++) {
+      int idx  <- word index
+      Copy(words[t], embed[idx]);
+    }
+
+The `ComputeGradient` function copies back the gradients to the `embed_` 
matrix.
+
+The configuration for `EmbeddingLayer` is like,
+
+    user_type: "kEmbedding"
+    [embedding_conf] {
+      word_dim: 15
+      vocab_size: 3720
+    }
+    srclayers: "data"
+    param {
+      name: "w1"
+      init {
+        type: kUniform
+        low:-0.3
+        high:0.3
+      }
+    }
+
+#### HiddenLayer
+
+This layer unrolls the recurrent connections for at most max_window times.
+The feature for position k is computed based on the feature from the embedding 
layer (position k)
+and the feature at position k-1 of this layer. The formula is
+
+`$$f[k]=\sigma (f[t-1]*W+src[t])$$`
+
+where `$W$` is a matrix with `word_dim_` x `word_dim_` parameters.
+
+If you want to implement a recurrent neural network following our
+design, this layer is of vital importance for you to refer to.
+
+    class HiddenLayer : public RNNLayer {
+      ...
+      const std::vector<Param*> GetParams() const override {
+        std::vector<Param*> params{weight_};
+        return params;
+      }
+    private:
+      Param* weight_;
+    };
+
+The `Setup` function setups the weight matrix as
+
+    weight_->Setup(std::vector<int>{word_dim, word_dim});
+
+The `ComputeFeature` function gets the effective window size (`window_`) from 
its source layer
+i.e., the embedding layer. Then it propagates the feature from position 0 to 
position
+`window_` -1. The detailed descriptions for this process are illustrated as 
follows.
+
+    void HiddenLayer::ComputeFeature() {
+      for(int t = 0; t < window_size; t++){
+        if(t == 0)
+          Copy(data[t], src[t]);
+        else
+          data[t]=sigmoid(data[t-1]*W + src[t]);
+      }
+    }
+
+The `ComputeGradient` function computes the gradient of the loss w.r.t. W and 
the source layer.
+Particularly, for each position k, since data[k] contributes to data[k+1] and 
the feature
+at position k in its destination layer (the loss layer), grad[k] should 
contains the gradient
+from two parts. The destination layer has already computed the gradient from 
the loss layer into
+grad[k]; In the `ComputeGradient` function, we need to add the gradient from 
position k+1.
+
+    void HiddenLayer::ComputeGradient(){
+      ...
+      for (int k = window_ - 1; k >= 0; k--) {
+        if (k < window_ - 1) {
+          grad[k] += dot(grad[k + 1], weight.T()); // add gradient from 
position t+1.
+        }
+        grad[k] =... // compute gL/gy[t], y[t]=data[t-1]*W+src[t]
+      }
+      gweight = dot(data.Slice(0, window_-1).T(), grad.Slice(1, window_));
+      Copy(gsrc, grad);
+    }
+
+After the loop, we get the gradient of the loss w.r.t y[k], which is used to
+compute the gradient of W and the src[k].
+
+#### LossLayer
+
+This layer computes the cross-entropy loss and the `$log_{10}P(w_{t+1}|w_t)$` 
(which
+could be averaged over all words by users to get the PPL value).
+
+There are two configuration fields to be specified by users.
+
+    message LossProto {
+      optional int32 nclass = 1;
+      optional int32 vocab_size = 2;
+    }
+
+There are two weight matrices to be learned
+
+    class LossLayer : public RNNLayer {
+      ...
+     private:
+      Param* word_weight_, *class_weight_;
+    }
+
+The ComputeFeature function computes the two probabilities respectively.
+
+`$$P(C_{w_{t+1}}|w_t) = Softmax(w_t * class\_weight_)$$`
+`$$P(w_{t+1}|C_{w_{t+1}}) = Softmax(w_t * 
word\_weight[class\_start:class\_end])$$`
+
+`$w_t$` is the feature from the hidden layer for the k-th word, its ground 
truth
+next word is `$w_{t+1}$`.  The first equation computes the probability 
distribution over all
+classes for the next word. The second equation computes the
+probability distribution over the words in the ground truth class for the next 
word.
+
+The ComputeGradient function computes the gradient of the source layer
+(i.e., the hidden layer) and the two weight matrices.
+
+### Updater Configuration
+
+We employ kFixedStep type of the learning rate change method and the
+configuration is as follows. We decay the learning rate once the performance 
does
+not increase on the validation dataset.
+
+    updater{
+      type: kSGD
+      learning_rate {
+        type: kFixedStep
+        fixedstep_conf:{
+          step:0
+          step:48810
+          step:56945
+          step:65080
+          step:73215
+          step_lr:0.1
+          step_lr:0.05
+          step_lr:0.025
+          step_lr:0.0125
+          step_lr:0.00625
+        }
+      }
+    }
+
+### TrainOneBatch() Function
+
+We use BP (BackPropagation) algorithm to train the RNN model here. The
+corresponding configuration can be seen below.
+
+    # In job.conf file
+    train_one_batch {
+      alg: kBackPropagation
+    }
+
+### Cluster Configuration
+
+The default cluster configuration can be used, i.e., single worker and single 
server
+in a single process.

Added: incubator/singa/site/trunk/content/markdown/v0.3.0/test.md
URL: 
http://svn.apache.org/viewvc/incubator/singa/site/trunk/content/markdown/v0.3.0/test.md?rev=1740048&view=auto
==============================================================================
--- incubator/singa/site/trunk/content/markdown/v0.3.0/test.md (added)
+++ incubator/singa/site/trunk/content/markdown/v0.3.0/test.md Wed Apr 20 
05:09:06 2016
@@ -0,0 +1,119 @@
+# Performance Test and Feature Extraction
+
+----
+
+Once SINGA finishes the training of a model, it would checkpoint the model 
parameters
+into disk files under the [checkpoint folder](checkpoint.html). Model 
parameters can also be dumped
+into this folder periodically during training if the
+[checkpoint configuration[(checkpoint.html) fields are set. With the checkpoint
+files, we can load the model parameters to conduct performance test, feature 
extraction and prediction
+against new data.
+
+To load the model parameters from checkpoint files, we need to add the paths of
+checkpoint files in the job configuration file
+
+    checkpoint_path: PATH_TO_CHECKPOINT_FILE1
+    checkpoint_path: PATH_TO_CHECKPOINT_FILE2
+    ...
+
+The new dataset is configured by specifying the ``test_step`` and the data 
input
+layer, e.g. the following configuration is for a dataset with 100*100 
instances.
+
+    test_steps: 100
+    net {
+      layer {
+        name: "input"
+        store_conf {
+          backend: "kvfile"
+          path: PATH_TO_TEST_KVFILE
+          batchsize: 100
+        }
+      }
+      ...
+    }
+
+## Performance Test
+
+This application is to test the performance, e.g., accuracy, of the previously
+trained model. Depending on the application, the test data may have ground 
truth
+labels or not. For example, if the model is trained for image classification,
+the test images must have ground truth labels to calculate the accuracy; if the
+model is an auto-encoder, the performance could be measured by reconstruction 
error, which
+does not require extra labels. For both cases, there would be a layer that 
calculates
+the performance, e.g., the `SoftmaxLossLayer`.
+
+The job configuration file for the cifar10 example can be used directly for 
testing after
+adding the checkpoint path. The running command is
+
+
+    $ ./bin/singa-run.sh -conf examples/cifar10/job.conf -test
+
+The performance would be output on the screen like,
+
+
+    Load from checkpoint file examples/cifar10/checkpoint/step50000-worker0
+    accuracy = 0.728000, loss = 0.807645
+
+## Feature extraction
+
+Since deep learning models are good at learning features, feature extraction 
for
+is a major functionality of deep learning models, e.g., we can extract features
+from the fully connected layers of 
[AlexNet](www.cs.toronto.edu/~fritz/absps/imagenet.pdf) as image features for 
image retrieval.
+To extract the features from one layer, we simply add an output layer after 
that layer.
+For instance, to extract the fully connected (with name `ip1`) layer of the 
cifar10 example model,
+we replace the `SoftmaxLossLayer` with a `CSVOutputLayer` which extracts the 
features into a CSV file,
+
+    layer {
+      name: "ip1"
+    }
+    layer {
+      name: "output"
+      type: kCSVOutput
+      srclayers: "ip1"
+      store_conf {
+        backend: "textfile"
+        path: OUTPUT_FILE_PATH
+      }
+    }
+
+The input layer and test steps, and the running command are the same as in 
*Performance Test* section.
+
+## Label Prediction
+
+If the output layer is connected to a layer that predicts labels of images,
+the output layer would then write the prediction results into files.
+SINGA provides two built-in layers for generating prediction results, namely,
+
+* SoftmaxLayer, generates probabilities of each candidate labels.
+* ArgSortLayer, sorts labels according to probabilities in descending order 
and keep topk labels.
+
+By connecting the two layers with the previous layer and the output layer, we 
can
+extract the predictions of each instance. For example,
+
+    layer {
+      name: "feature"
+      ...
+    }
+    layer {
+      name: "softmax"
+      type: kSoftmax
+      srclayers: "feature"
+    }
+    layer {
+      name: "prediction"
+      type: kArgSort
+      srclayers: "softmax"
+      argsort_conf {
+        topk: 5
+      }
+    }
+    layer {
+      name: "output"
+      type: kCSVOutput
+      srclayers: "prediction"
+      store_conf {}
+    }
+
+The top-5 labels of each instance will be written as one line of the output 
CSV file.
+Currently, above layers cannot co-exist with the loss layers used for training.
+Please comment out the loss layers for extracting prediction results.

Added: incubator/singa/site/trunk/content/markdown/v0.3.0/train-one-batch.md
URL: 
http://svn.apache.org/viewvc/incubator/singa/site/trunk/content/markdown/v0.3.0/train-one-batch.md?rev=1740048&view=auto
==============================================================================
--- incubator/singa/site/trunk/content/markdown/v0.3.0/train-one-batch.md 
(added)
+++ incubator/singa/site/trunk/content/markdown/v0.3.0/train-one-batch.md Wed 
Apr 20 05:09:06 2016
@@ -0,0 +1,179 @@
+# Train-One-Batch
+
+---
+
+For each SGD iteration, every worker calls the `TrainOneBatch` function to
+compute gradients of parameters associated with local layers (i.e., layers
+dispatched to it).  SINGA has implemented two algorithms for the
+`TrainOneBatch` function. Users select the corresponding algorithm for
+their model in the configuration.
+
+## Basic user guide
+
+### Back-propagation
+
+[BP algorithm](http://yann.lecun.com/exdb/publis/pdf/lecun-98b.pdf) is used for
+computing gradients of feed-forward models, e.g., [CNN](cnn.html)
+and [MLP](mlp.html), and [RNN](rnn.html) models in SINGA.
+
+
+    # in job.conf
+    alg: kBP
+
+To use the BP algorithm for the `TrainOneBatch` function, users just simply
+configure the `alg` field with `kBP`. If a neural net contains user-defined
+layers, these layers must be implemented properly be to consistent with the
+implementation of the BP algorithm in SINGA (see below).
+
+
+### Contrastive Divergence
+
+[CD algorithm](http://www.cs.toronto.edu/~fritz/absps/nccd.pdf) is used for
+computing gradients of energy models like RBM.
+
+    # job.conf
+    alg: kCD
+    cd_conf {
+      cd_k: 2
+    }
+
+To use the CD algorithm for the `TrainOneBatch` function, users just configure
+the `alg` field to `kCD`. Uses can also configure the Gibbs sampling steps in
+the CD algorthm through the `cd_k` field. By default, it is set to 1.
+
+
+
+## Advanced user guide
+
+### Implementation of BP
+
+The BP algorithm is implemented in SINGA following the below pseudo code,
+
+    BPTrainOnebatch(step, net) {
+      // forward propagate
+      foreach layer in net.local_layers() {
+        if IsBridgeDstLayer(layer)
+          recv data from the src layer (i.e., BridgeSrcLayer)
+        foreach param in layer.params()
+          Collect(param) // recv response from servers for last update
+
+        layer.ComputeFeature(kForward)
+
+        if IsBridgeSrcLayer(layer)
+          send layer.data_ to dst layer
+      }
+      // backward propagate
+      foreach layer in reverse(net.local_layers) {
+        if IsBridgeSrcLayer(layer)
+          recv gradient from the dst layer (i.e., BridgeDstLayer)
+          recv response from servers for last update
+
+        layer.ComputeGradient()
+        foreach param in layer.params()
+          Update(step, param) // send param.grad_ to servers
+
+        if IsBridgeDstLayer(layer)
+          send layer.grad_ to src layer
+      }
+    }
+
+
+It forwards features through all local layers (can be checked by layer
+partition ID and worker ID) and backwards gradients in the reverse order.
+[BridgeSrcLayer](layer.html#bridgesrclayer--bridgedstlayer)
+(resp. `BridgeDstLayer`) will be blocked until the feature (resp.
+gradient) from the source (resp. destination) layer comes. Parameter gradients
+are sent to servers via `Update` function. Updated parameters are collected via
+`Collect` function, which will be blocked until the parameter is updated.
+[Param](param.html) objects have versions, which can be used to
+check whether the `Param` objects have been updated or not.
+
+Since RNN models are unrolled into feed-forward models, users need to implement
+the forward propagation in the recurrent layer's `ComputeFeature` function,
+and implement the backward propagation in the recurrent layer's 
`ComputeGradient`
+function. As a result, the whole `TrainOneBatch` runs
+[back-propagation through time 
(BPTT)](https://en.wikipedia.org/wiki/Backpropagation_through_time)  algorithm.
+
+### Implementation of CD
+
+The CD algorithm is implemented in SINGA following the below pseudo code,
+
+    CDTrainOneBatch(step, net) {
+      # positive phase
+      foreach layer in net.local_layers()
+        if IsBridgeDstLayer(layer)
+          recv positive phase data from the src layer (i.e., BridgeSrcLayer)
+        foreach param in layer.params()
+          Collect(param)  // recv response from servers for last update
+        layer.ComputeFeature(kPositive)
+        if IsBridgeSrcLayer(layer)
+          send positive phase data to dst layer
+
+      # negative phase
+      foreach gibbs in [0...layer_proto_.cd_k]
+        foreach layer in net.local_layers()
+          if IsBridgeDstLayer(layer)
+            recv negative phase data from the src layer (i.e., BridgeSrcLayer)
+          layer.ComputeFeature(kPositive)
+          if IsBridgeSrcLayer(layer)
+            send negative phase data to dst layer
+
+      foreach layer in net.local_layers()
+        layer.ComputeGradient()
+        foreach param in layer.params
+          Update(param)
+    }
+
+Parameter gradients are computed after the positive phase and negative phase.
+
+### Implementing a new algorithm
+
+SINGA implements BP and CD by creating two subclasses of
+the [Worker](../api/classsinga_1_1Worker.html) class:
+[BPWorker](../api/classsinga_1_1BPWorker.html)'s `TrainOneBatch` function 
implements the BP
+algorithm; [CDWorker](../api/classsinga_1_1CDWorker.html)'s `TrainOneBatch` 
function implements the CD
+algorithm. To implement a new algorithm for the `TrainOneBatch` function, users
+need to create a new subclass of the `Worker`, e.g.,
+
+    class FooWorker : public Worker {
+      void TrainOneBatch(int step, shared_ptr<NeuralNet> net, Metric* perf) 
override;
+      void TestOneBatch(int step, Phase phase, shared_ptr<NeuralNet> net, 
Metric* perf) override;
+    };
+
+The `FooWorker` must implement the above two functions for training one
+mini-batch and testing one mini-batch. The `perf` argument is for collecting
+training or testing performance, e.g., the objective loss or accuracy. It is
+passed to the `ComputeFeature` function of each layer.
+
+Users can define some fields for users to configure
+
+    # in user.proto
+    message FooWorkerProto {
+      optional int32 b = 1;
+    }
+
+    extend JobProto {
+      optional FooWorkerProto foo_conf = 101;
+    }
+
+    # in job.proto
+    JobProto {
+      ...
+      extension 101..max;
+    }
+
+It is similar as [adding configuration fields for a new 
layer](layer.html#implementing-a-new-layer-subclass).
+
+To use `FooWorker`, users need to register it in the 
[main.cc](programming-guide.html)
+and configure the `alg` and `foo_conf` fields,
+
+    # in main.cc
+    const int kFoo = 3; // worker ID, must be different to that of CDWorker 
and BPWorker
+    driver.RegisterWorker<FooWorker>(kFoo);
+
+    # in job.conf
+    ...
+    alg: 3
+    [foo_conf] {
+      b = 4;
+    }

Added: incubator/singa/site/trunk/content/markdown/v0.3.0/updater.md
URL: 
http://svn.apache.org/viewvc/incubator/singa/site/trunk/content/markdown/v0.3.0/updater.md?rev=1740048&view=auto
==============================================================================
--- incubator/singa/site/trunk/content/markdown/v0.3.0/updater.md (added)
+++ incubator/singa/site/trunk/content/markdown/v0.3.0/updater.md Wed Apr 20 
05:09:06 2016
@@ -0,0 +1,326 @@
+# Updater
+
+---
+
+Every server in SINGA has an [Updater](../api/classsinga_1_1Updater.html)
+instance that updates parameters based on gradients.
+In this page, the *Basic user guide* describes the configuration of an updater.
+The *Advanced user guide* present details on how to implement a new updater 
and a new
+learning rate changing method.
+
+## Basic user guide
+
+There are many different parameter updating protocols (i.e., subclasses of
+`Updater`). They share some configuration fields like
+
+* `type`, an integer for identifying an updater;
+* `learning_rate`, configuration for the
+[LRGenerator](../api/classsinga_1_1LRGenerator.html) which controls the 
learning rate.
+* `weight_decay`, the co-efficient for [L2 * 
regularization](http://deeplearning.net/tutorial/gettingstarted.html#regularization).
+* 
[momentum](http://ufldl.stanford.edu/tutorial/supervised/OptimizationStochasticGradientDescent/).
+
+If you are not familiar with the above terms, you can get their meanings in
+[this page provided by 
Karpathy](http://cs231n.github.io/neural-networks-3/#update).
+
+### Configuration of built-in updater classes
+
+#### Updater
+The base `Updater` implements the [vanilla SGD 
algorithm](http://cs231n.github.io/neural-networks-3/#sgd).
+Its configuration type is `kSGD`.
+Users need to configure at least the `learning_rate` field.
+`momentum` and `weight_decay` are optional fields.
+
+    updater{
+      type: kSGD
+      momentum: float
+      weight_decay: float
+      learning_rate {
+        ...
+      }
+    }
+
+#### AdaGradUpdater
+
+It inherits the base `Updater` to implement the
+[AdaGrad](http://www.magicbroom.info/Papers/DuchiHaSi10.pdf) algorithm.
+Its type is `kAdaGrad`.
+`AdaGradUpdater` is configured similar to `Updater` except
+that `momentum` is not used.
+
+#### NesterovUpdater
+
+It inherits the base `Updater` to implements the
+[Nesterov](http://arxiv.org/pdf/1212.0901v2.pdf) (section 3.5) updating 
protocol.
+Its type is `kNesterov`.
+`learning_rate` and `momentum` must be configured. `weight_decay` is an
+optional configuration field.
+
+#### RMSPropUpdater
+
+It inherits the base `Updater` to implements the
+[RMSProp algorithm](http://cs231n.github.io/neural-networks-3/#sgd) proposed by
+[Hinton](http://www.cs.toronto.edu/%7Etijmen/csc321/slides/lecture_slides_lec6.pdf)(slide
 29).
+Its type is `kRMSProp`.
+
+    updater {
+      type: kRMSProp
+      rmsprop_conf {
+       rho: float # [0,1]
+      }
+    }
+
+#### AdaDeltaUpdater
+
+It inherits the base `Updater` to implements the
+[AdaDelta](http://arxiv.org/abs/1212.5701) updating algorithm.
+Its type is `kAdaDelta`.
+
+    updater {
+      type: kAdaDelta
+      adadelta_conf {
+       rho: float # [0,1]
+      }
+    }
+
+#### Adam
+
+It inherits the base `Updater` to implements the
+[Adam](http://arxiv.org/pdf/1412.6980.pdf) updating algorithm.
+Its type is `kAdam`.
+`beta1` and `beta2` is floats, 0 < `beta` < 1, generally close to 1.
+
+    updater {
+      type: kAdam
+      adam_conf {
+       beta1: float # [0,1]
+       beta2: float # [0,1]
+      }
+    }
+
+#### AdaMax
+
+It inherits the base `Updater` to implements the
+[AdaMax](http://arxiv.org/pdf/1412.6980.pdf) updating algorithm.
+Its type is `kAdamMax`.
+`beta1` and `beta2` is floats, 0 < `beta` < 1, generally close to 1.
+
+    updater {
+      type: kAdamMax
+      adammax_conf {
+       beta1: float # [0,1]
+       beta2: float # [0,1]
+      }
+    }
+
+### Configuration of learning rate
+
+The `learning_rate` field is configured as,
+
+    learning_rate {
+      type: ChangeMethod
+      base_lr: float  # base/initial learning rate
+      ... # fields to a specific changing method
+    }
+
+The common fields include `type` and `base_lr`. SINGA provides the following
+`ChangeMethod`s.
+
+#### kFixed
+
+The `base_lr` is used for all steps.
+
+#### kLinear
+
+The updater should be configured like
+
+    learning_rate {
+      base_lr:  float
+      linear_conf {
+        freq: int
+        final_lr: float
+      }
+    }
+
+Linear interpolation is used to change the learning rate,
+
+    lr = (1 - step / freq) * base_lr + (step / freq) * final_lr
+
+#### kExponential
+
+The udapter should be configured like
+
+    learning_rate {
+      base_lr: float
+      exponential_conf {
+        freq: int
+      }
+    }
+
+The learning rate for `step` is
+
+    lr = base_lr / 2^(step / freq)
+
+#### kInverseT
+
+The updater should be configured like
+
+    learning_rate {
+      base_lr: float
+      inverset_conf {
+        final_lr: float
+      }
+    }
+
+The learning rate for `step` is
+
+    lr = base_lr / (1 + step / final_lr)
+
+#### kInverse
+
+The updater should be configured like
+
+    learning_rate {
+      base_lr: float
+      inverse_conf {
+        gamma: float
+        pow: float
+      }
+    }
+
+
+The learning rate for `step` is
+
+    lr = base_lr * (1 + gamma * setp)^(-pow)
+
+
+#### kStep
+
+The updater should be configured like
+
+    learning_rate {
+      base_lr : float
+      step_conf {
+        change_freq: int
+        gamma: float
+      }
+    }
+
+
+The learning rate for `step` is
+
+    lr = base_lr * gamma^ (step / change_freq)
+
+#### kFixedStep
+
+The updater should be configured like
+
+    learning_rate {
+      fixedstep_conf {
+        step: int
+        step_lr: float
+
+        step: int
+        step_lr: float
+
+        ...
+      }
+    }
+
+Denote the i-th tuple as (step[i], step_lr[i]), then the learning rate for
+`step` is,
+
+    step_lr[k]
+
+where step[k] is the smallest number that is larger than `step`.
+
+
+## Advanced user guide
+
+### Implementing a new Updater subclass
+
+The base Updater class has one virtual function,
+
+    class Updater{
+     public:
+      virtual void Update(int step, Param* param, float grad_scale = 1.0f) = 0;
+
+     protected:
+      UpdaterProto proto_;
+      LRGenerator lr_gen_;
+    };
+
+It updates the values of the `param` based on its gradients. The `step`
+argument is for deciding the learning rate which may change through time
+(step). `grad_scale` scales the original gradient values. This function is
+called by servers once it receives all gradients for the same `Param` object.
+
+To implement a new Updater subclass, users must override the `Update` function.
+
+    class FooUpdater : public Updater {
+      void Update(int step, Param* param, float grad_scale = 1.0f) override;
+    };
+
+Configuration of this new updater can be declared similar to that of a new
+layer,
+
+    # in user.proto
+    FooUpdaterProto {
+      optional int32 c = 1;
+    }
+
+    extend UpdaterProto {
+      optional FooUpdaterProto fooupdater_conf= 101;
+    }
+
+The new updater should be registered in the
+[main function](programming-guide.html)
+
+    driver.RegisterUpdater<FooUpdater>("FooUpdater");
+
+Users can then configure the job as
+
+    # in job.conf
+    updater {
+      user_type: "FooUpdater"  # must use user_type with the same string 
identifier as the one used for registration
+      fooupdater_conf {
+        c : 20;
+      }
+    }
+
+### Implementing a new LRGenerator subclass
+
+The base `LRGenerator` is declared as,
+
+    virtual float Get(int step);
+
+To implement a subclass, e.g., `FooLRGen`, users should declare it like
+
+    class FooLRGen : public LRGenerator {
+     public:
+      float Get(int step) override;
+    };
+
+Configuration of `FooLRGen` can be defined using a protocol message,
+
+    # in user.proto
+    message FooLRProto {
+     ...
+    }
+
+    extend LRGenProto {
+      optional FooLRProto foolr_conf = 101;
+    }
+
+The configuration is then like,
+
+    learning_rate {
+      user_type : "FooLR" # must use user_type with the same string identifier 
as the one used for registration
+      base_lr: float
+      foolr_conf {
+        ...
+      }
+    }
+
+Users have to register this subclass in the main function,
+
+      driver.RegisterLRGenerator<FooLRGen, std::string>("FooLR")

Added: incubator/singa/site/trunk/content/markdown/v0.3.0/zh/checkpoint.md
URL: 
http://svn.apache.org/viewvc/incubator/singa/site/trunk/content/markdown/v0.3.0/zh/checkpoint.md?rev=1740048&view=auto
==============================================================================
--- incubator/singa/site/trunk/content/markdown/v0.3.0/zh/checkpoint.md (added)
+++ incubator/singa/site/trunk/content/markdown/v0.3.0/zh/checkpoint.md Wed Apr 
20 05:09:06 2016
@@ -0,0 +1,70 @@
+# CheckPoint
+
+---
+
+SINGA checkpoints model parameters onto disk periodically according to user
+configured frequency. By checkpointing model parameters, we can
+
+  1. resume the training from the last checkpointing. For example, if
+    the program crashes before finishing all training steps, we can continue
+    the training using checkpoint files.
+
+  2. use them to initialize a similar model. For example, the
+    parameters from training a RBM model can be used to initialize
+    a [deep auto-encoder](rbm.html) model.
+
+## Configuration
+
+Checkpointing is controlled by two configuration fields:
+
+* `checkpoint_after`, start checkpointing after this number of training steps,
+* `checkpoint_freq`, frequency of doing checkpointing.
+
+For example,
+
+    # job.conf
+    checkpoint_after: 100
+    checkpoint_frequency: 300
+    ...
+
+Checkpointing files are located at 
*WORKSPACE/checkpoint/stepSTEP-workerWORKERID*.
+*WORKSPACE* is configured in
+
+    cluster {
+      workspace:
+    }
+
+For the above configuration, after training for 700 steps, there would be
+two checkpointing files,
+
+    step400-worker0
+    step700-worker0
+
+## Application - resuming training
+
+We can resume the training from the last checkpoint (i.e., step 700) by,
+
+    ./bin/singa-run.sh -conf JOB_CONF -resume
+
+There is no change to the job configuration.
+
+## Application - model initialization
+
+We can also use the checkpointing file from step 400 to initialize
+a new model by configuring the new job as,
+
+    # job.conf
+    checkpoint : "WORKSPACE/checkpoint/step400-worker0"
+    ...
+
+If there are multiple checkpointing files for the same snapshot due to model
+partitioning, all the checkpointing files should be added,
+
+    # job.conf
+    checkpoint : "WORKSPACE/checkpoint/step400-worker0"
+    checkpoint : "WORKSPACE/checkpoint/step400-worker1"
+    ...
+
+The training command is the same as starting a new job,
+
+    ./bin/singa-run.sh -conf JOB_CONF

Added: incubator/singa/site/trunk/content/markdown/v0.3.0/zh/cnn.md
URL: 
http://svn.apache.org/viewvc/incubator/singa/site/trunk/content/markdown/v0.3.0/zh/cnn.md?rev=1740048&view=auto
==============================================================================
--- incubator/singa/site/trunk/content/markdown/v0.3.0/zh/cnn.md (added)
+++ incubator/singa/site/trunk/content/markdown/v0.3.0/zh/cnn.md Wed Apr 20 
05:09:06 2016
@@ -0,0 +1,239 @@
+# CNN Example
+
+---
+
+Convolutional neural network (CNN) is a type of feed-forward artificial neural
+network widely used for image and video classification. In this example, we 
will
+use a deep CNN model to do image classification for the
+[CIFAR10 dataset](http://www.cs.toronto.edu/~kriz/cifar.html).
+
+
+## Running instructions
+
+Please refer to the [installation](installation.html) page for
+instructions on building SINGA, and the [quick start](quick-start.html)
+for instructions on starting zookeeper.
+
+We have provided scripts for preparing the training and test dataset in 
*examples/cifar10/*.
+
+    # in examples/cifar10
+    $ cp Makefile.example Makefile
+    $ make download
+    $ make create
+
+
+### Training on CPU
+
+We can start the training by
+
+    ./bin/singa-run.sh -conf examples/cifar10/job.conf
+
+You should see output like
+
+    Record job information to /tmp/singa-log/job-info/job-2-20150817-055601
+    Executing : ./singa -conf /xxx/incubator-singa/examples/cifar10/job.conf 
-singa_conf /xxx/incubator-singa/conf/singa.conf -singa_job 2
+    E0817 06:56:18.868259 33849 cluster.cc:51] proc #0 -> 192.168.5.128:49152 
(pid = 33849)
+    E0817 06:56:18.928452 33871 server.cc:36] Server (group = 0, id = 0) start
+    E0817 06:56:18.928469 33872 worker.cc:134] Worker (group = 0, id = 0) start
+    E0817 06:57:13.657302 33849 trainer.cc:373] Test step-0, loss : 2.302588, 
accuracy : 0.077900
+    E0817 06:57:17.626708 33849 trainer.cc:373] Train step-0, loss : 2.302578, 
accuracy : 0.062500
+    E0817 06:57:24.142645 33849 trainer.cc:373] Train step-30, loss : 
2.302404, accuracy : 0.131250
+    E0817 06:57:30.813354 33849 trainer.cc:373] Train step-60, loss : 
2.302248, accuracy : 0.156250
+    E0817 06:57:37.556655 33849 trainer.cc:373] Train step-90, loss : 
2.301849, accuracy : 0.175000
+    E0817 06:57:44.971276 33849 trainer.cc:373] Train step-120, loss : 
2.301077, accuracy : 0.137500
+    E0817 06:57:51.801949 33849 trainer.cc:373] Train step-150, loss : 
2.300410, accuracy : 0.135417
+    E0817 06:57:58.682281 33849 trainer.cc:373] Train step-180, loss : 
2.300067, accuracy : 0.127083
+    E0817 06:58:05.578366 33849 trainer.cc:373] Train step-210, loss : 
2.300143, accuracy : 0.154167
+    E0817 06:58:12.518497 33849 trainer.cc:373] Train step-240, loss : 
2.295912, accuracy : 0.185417
+
+After training some steps (depends on the setting) or the job is
+finished, SINGA will [checkpoint](checkpoint.html) the model parameters.
+
+### Training on GPU
+
+Since version 0.2, we can train CNN models on GPU using cuDNN. Please refer to
+the [GPU page](gpu.html) for details on compiling SINGA with GPU and cuDNN.
+The configuration file is similar to that for CPU training, except that the
+cuDNN layers are used and the GPU device is configured.
+
+    ./bin/singa-run.sh -conf examples/cifar10/cudnn.conf
+
+### Training using Python script
+
+The python helpers coming with SINGA 0.2 make it easy to configure a training
+job. For example the *job.conf* is replaced with a simple python script
+*mnist_mlp.py* which has about 30 lines of code following the [Keras 
API](http://keras.io/).
+
+      # on CPU
+    ./bin/singa-run.sh -exec tool/python/examples/cifar10_cnn.py
+      # on GPU
+    ./bin/singa-run.sh -exec tool/python/examples/cifar10_cnn_cudnn.py
+
+## Details
+
+To train a model in SINGA, you need to prepare the datasets,
+and a job configuration which specifies the neural net structure, training
+algorithm (BP or CD), SGD update algorithm (e.g. Adagrad),
+number of training/test steps, etc.
+
+### Data preparation
+
+Before using SINGA, you need to write a program to convert the dataset
+into a format that SINGA can read. Please refer to the
+[Data Preparation](data.html#example---cifar-dataset) to get details about
+preparing this CIFAR10 dataset.
+
+### Neural net
+
+Figure 1 shows the net structure of the CNN model we used in this example, 
which is
+set following 
[Alex](https://code.google.com/p/cuda-convnet/source/browse/trunk/example-layers/layers-18pct.cfg.)
+The dashed circle represents one feature transformation stage, which generally
+has four layers as shown in the figure. Sometimes the rectifier layer and 
normalization layer
+are omitted or swapped in one stage. For this example, there are 3 such stages.
+
+Next we follow the guide in [neural net page](neural-net.html)
+and [layer page](layer.html) to write the neural net configuration.
+
+<div style = "text-align: center">
+<img src = "../images/example-cnn.png" style = "width: 200px"> <br/>
+<strong>Figure 1 - Net structure of the CNN example.</strong></img>
+</div>
+
+* We configure an input layer to read the training/testing records from a disk 
file.
+
+        layer{
+          name: "data"
+          type: kRecordInput
+          store_conf {
+            backend: "kvfile"
+            path: "examples/cifar10/train_data.bin"
+            mean_file: "examples/cifar10/image_mean.bin"
+            batchsize: 64
+            random_skip: 5000
+            shape: 3
+            shape: 32
+            shape: 32
+           }
+           exclude: kTest  # exclude this layer for the testing net
+        }
+        layer{
+          name: "data"
+          type: kRecordInput
+          store_conf {
+            backend: "kvfile"
+            path: "examples/cifar10/test_data.bin"
+            mean_file: "examples/cifar10/image_mean.bin"
+            batchsize: 100
+            shape: 3
+            shape: 32
+            shape: 32
+           }
+         exclude: kTrain # exclude this layer for the training net
+        }
+
+
+* We configure layers for the feature transformation as follows
+(all layers are built-in layers in SINGA; hyper-parameters of these layers are 
set according to
+[Alex's 
setting](https://code.google.com/p/cuda-convnet/source/browse/trunk/example-layers/layers-18pct.cfg)).
+
+        layer {
+          name: "conv1"
+          type: kConvolution
+          srclayers: "data"
+          convolution_conf {... }
+          ...
+        }
+        layer {
+          name: "pool1"
+          type: kPooling
+          srclayers: "conv1"
+          pooling_conf {... }
+        }
+        layer {
+          name: "relu1"
+          type: kReLU
+          srclayers:"pool1"
+        }
+        layer {
+          name: "norm1"
+          type: kLRN
+          lrn_conf {... }
+          srclayers:"relu1"
+        }
+
+  The configurations for another 2 stages are omitted here.
+
+* There is an [inner product layer](layer.html#innerproductlayer)
+after the 3 transformation stages, which is
+configured with 10 output units, i.e., the number of total labels. The weight
+matrix Param is configured with a large weight decay scale to reduce the 
over-fitting.
+
+        layer {
+          name: "ip1"
+          type: kInnerProduct
+          srclayers:"pool3"
+          innerproduct_conf {
+            num_output: 10
+          }
+          param {
+            name: "w4"
+            wd_scale:250
+            ...
+          }
+          param {
+            name: "b4"
+            ...
+          }
+        }
+
+* The last layer is a [Softmax loss layer](layer.html#softmaxloss)
+
+        layer{
+          name: "loss"
+          type: kSoftmaxLoss
+          softmaxloss_conf{ topk:1 }
+          srclayers:"ip1"
+          srclayers: "data"
+        }
+
+### Updater
+
+The [normal SGD updater](updater.html#updater) is selected.
+The learning rate is changed like going down stairs, and is configured using 
the
+[kFixedStep](updater.html#kfixedstep) type.
+
+        updater{
+          type: kSGD
+          weight_decay:0.004
+          learning_rate {
+            type: kFixedStep
+            fixedstep_conf:{
+              step:0             # lr for step 0-60000 is 0.001
+              step:60000         # lr for step 60000-65000 is 0.0001
+              step:65000         # lr for step 650000- is 0.00001
+              step_lr:0.001
+              step_lr:0.0001
+              step_lr:0.00001
+            }
+          }
+        }
+
+### TrainOneBatch algorithm
+
+The CNN model is a feed forward model, thus should be configured to use the
+[Back-propagation algorithm](train-one-batch.html#back-propagation).
+
+    train_one_batch {
+      alg: kBP
+    }
+
+### Cluster setting
+
+The following configuration set a single worker and server for training.
+[Training frameworks](frameworks.html) page introduces configurations of a 
couple of distributed
+training frameworks.
+
+    cluster {
+      nworker_groups: 1
+      nserver_groups: 1
+    }

Added: incubator/singa/site/trunk/content/markdown/v0.3.0/zh/data.md
URL: 
http://svn.apache.org/viewvc/incubator/singa/site/trunk/content/markdown/v0.3.0/zh/data.md?rev=1740048&view=auto
==============================================================================
--- incubator/singa/site/trunk/content/markdown/v0.3.0/zh/data.md (added)
+++ incubator/singa/site/trunk/content/markdown/v0.3.0/zh/data.md Wed Apr 20 
05:09:06 2016
@@ -0,0 +1,98 @@
+# Data Preparation
+
+---
+
+SINGA uses input layers to load data.
+Users can store their data in any format (e.g., CSV or binary) and at any 
places
+(e.g., disk file or HDFS) as long as there are corresponding input layers that
+can read the data records and parse them.
+
+To make it easy for users, SINGA provides a [StoreInputLayer] to read data
+in the format of (string:key, string:value) tuples from a couple of sources.
+These sources are abstracted using a [Store]() class which is a simple version 
of
+the DB abstraction in Caffe. The base Store class provides the following 
operations
+for reading and writing tuples,
+
+    Open(string path, Mode mode); // open the store for kRead or kCreate or 
kAppend
+    Close();
+
+    Read(string* key, string* val); // read a tuple; return false if fail
+    Write(string key, string val);  // write a tuple
+    Flush();
+
+Currently, two implementations are provided, namely
+
+1. [KVFileStore] for storing tuples in [KVFile]() (a binary file).
+The *create_data.cc* files in *examples/cifar10* and *examples/mnist* provide
+examples of storing records using KVFileStore.
+
+2. [TextFileStore] for storing tuples in plain text file (one line per tuple).
+
+The (key, value) tuple are parsed by subclasses of StoreInputLayer depending 
on the
+format of the tuple,
+
+* [ProtoRecordInputLayer] parses the value field from one
+tuple into a [SingleLabelImageRecord], which is generated by Google Protobuf 
according
+to [common.proto]. It can be used to store features for images (e.g., using 
the pixel field)
+or other objects (using the data field). The key field is not used.
+
+* [CSVRecordInputLayer] parses one tuple as a CSV line (separated by comma).
+
+
+## Using built-in record format
+
+SingleLabelImageRecord is a built-in record in SINGA for storing image 
features.
+It is used in the cifar10 and mnist examples.
+
+    message SingleLabelImageRecord {
+      repeated int32 shape = 1;                // it obtains 3 (rgb channels), 
32 (row), 32 (col)
+      optional int32 label = 2;                // label
+      optional bytes pixel = 3;                // pixels
+      repeated float data = 4 [packed = true]; // it is used for normalization
+   }
+
+The data preparation instructions for the [CIFAR-10 image 
dataset](http://www.cs.toronto.edu/~kriz/cifar.html)
+will be elaborated here. This dataset consists of 60,000 32x32 color images in 
10 classes, with 6,000 images per class.
+There are 50,000 training images and 10,000 test images.
+Each image has a single label. This dataset is stored in binary files with 
specific format.
+SINGA comes with the 
[create_data.cc](https://github.com/apache/incubator-singa/blob/master/examples/cifar10/create_data.cc)
+to convert images in the binary files into `SingleLabelImageRecord`s and 
insert them into training and test stores.
+
+1. Download raw data. The following command will download the dataset into 
*cifar-10-batches-bin* folder.
+
+        # in SINGA_ROOT/examples/cifar10
+        $ cp Makefile.example Makefile   // an example makefile is provided
+        $ make download
+
+2. Fill one record for each image, and insert it to store.
+
+        KVFileStore store;
+        store.Open(output_file_path, singa::io::kCreate);
+
+        singa::SingleLabelImageRecord image;
+        for (int image_id = 0; image_id < 50000; image_id ++) {
+          // fill the record with image feature and label from downloaded 
binay files
+          string str;
+          image.SerializeToString(&str);
+          store.Write(to_string(image_id), str);
+        }
+        store.Flush();
+        store.Close();
+
+    The data store for testing data is created similarly.
+    In addition, it computes average values (not shown here) of image pixels 
and
+    insert the mean values into a SingleLabelImageRecord, which is then written
+    into a another store.
+
+3. Compile and run the program. SINGA provides an example Makefile that 
contains instructions
+    for compiling the source code and linking it with *libsinga.so*. Users 
just execute the following command.
+
+        $ make create
+
+## using user-defined record format
+
+If users cannot use the SingleLabelImageRecord or CSV record for their data.
+They can define their own record format e.g., using Google Protobuf.
+A record can be written into a data store as long as it can be converted
+into byte string. Correspondingly, subclasses of StoreInputLayer are required 
to
+parse user-defined records.

Added: 
incubator/singa/site/trunk/content/markdown/v0.3.0/zh/distributed-training.md
URL: 
http://svn.apache.org/viewvc/incubator/singa/site/trunk/content/markdown/v0.3.0/zh/distributed-training.md?rev=1740048&view=auto
==============================================================================
--- 
incubator/singa/site/trunk/content/markdown/v0.3.0/zh/distributed-training.md 
(added)
+++ 
incubator/singa/site/trunk/content/markdown/v0.3.0/zh/distributed-training.md 
Wed Apr 20 05:09:06 2016
@@ -0,0 +1,25 @@
+# Distributed Training
+
+---
+
+SINGA is designed for distributed training of large deep learning models with 
huge amount of training data.
+We also provide high-level descriptions of design behind SINGA's distributed 
architecture. 
+
+* [System Architecture](architecture.html)
+
+* [Training Frameworks](frameworks.html)
+
+* [System Communication](communication.html)
+
+SINGA supports different options for training a model in parallel, includeing 
data parallelism, model parallelism and hybrid parallelism.
+
+* [Hybrid Parallelism](hybrid.html)
+
+SINGA is intergrated with Mesos, so that distributed training can be started 
as a Mesos framework. Currently, the Mesos cluster can be set up from SINGA 
containers, i.e. we provide Docker images that bundles Mesos and SINGA 
together. Refer to the guide below for instructions as how to start and use the 
cluster.
+
+* [Distributed training on Mesos](mesos.html)
+
+SINGA can run on top of distributed storage system to achieve scalability. The 
current version of SINGA supports HDFS.
+
+* [Running SINGA on HDFS](hdfs.html)
+

Added: incubator/singa/site/trunk/content/markdown/v0.3.0/zh/index.md
URL: 
http://svn.apache.org/viewvc/incubator/singa/site/trunk/content/markdown/v0.3.0/zh/index.md?rev=1740048&view=auto
==============================================================================
--- incubator/singa/site/trunk/content/markdown/v0.3.0/zh/index.md (added)
+++ incubator/singa/site/trunk/content/markdown/v0.3.0/zh/index.md Wed Apr 20 
05:09:06 2016
@@ -0,0 +1,8 @@
+SINGA ä¸æææ¡£
+
+---
+
+* [ç®ä»](overview.html)
+* [å®è£](installation_source.html)
+* [ä½¿ç¨æå](programming-guide.html)
+

Added: 
incubator/singa/site/trunk/content/markdown/v0.3.0/zh/installation_source.md
URL: 
http://svn.apache.org/viewvc/incubator/singa/site/trunk/content/markdown/v0.3.0/zh/installation_source.md?rev=1740048&view=auto
==============================================================================
--- 
incubator/singa/site/trunk/content/markdown/v0.3.0/zh/installation_source.md 
(added)
+++ 
incubator/singa/site/trunk/content/markdown/v0.3.0/zh/installation_source.md 
Wed Apr 20 05:09:06 2016
@@ -0,0 +1,228 @@
+# ä»æºç¨åºå®è£SIGNA
+
+---
+
+## ä¾èµ
+
+SINGA å¨Linuxå¹³å°ä¸å¼åä¸æµè¯ãå®è£SINGAéè¦ä¸æåä¾èµåºï¼
+
+  * glog version 0.3.3
+
+  * google-protobuf version 2.6.0
+
+  * openblas version >= 0.2.10
+
+  * zeromq version >= 3.2
+
+  * czmq version >= 3
+
+  * zookeeper version 3.4.6
+
+
+å¯éä¾èµåæ¬ï¼
+
+  * lmdb version 0.9.10
+
+
+ä½ å¯ä»¥ä½¿ç¨ä¸åå½ä»¤å°ææçä¾èµåºå®è£å°$PREFIXæä»¶å¤¹ä¸ï¼
+
+    # make sure you are in the thirdparty folder
+    cd thirdparty
+    ./install.sh all $PREFIX
+
+å¦æ$PREFIXä¸æ¯ä¸ä¸ªç³»ç»è·¯å¾ï¼å¦ï¼/esr/local/ï¼ï¼è¯·å¨ç»§ç»å®è£
åä½¿ç¨ä¸è¿°å½ä»¤å¯¼åºç¸å³åéï¼
+
+    export LD_LIBRARY_PATH=$PREFIX/lib:$LD_LIBRARY_PATH
+    export CPLUS_INCLUDE_PATH=$PREFIX/include:$CPLUS_INCLUDE_PATH
+    export LIBRARY_PATH=$PREFIX/lib:$LIBRARY_PATH
+    export PATH=$PREFIX/bin:$PATH
+
+å³äºä½¿ç¨è¿ä¸ªèæ¬çç»èåæä¼è¯¦ç»ä»ç»ã
+
+## ä»æºç¨åºå®è£SINGA
+
+SINGA ä½¿ç¨ GNU autotools æå»ºï¼éè¦GCC (version >= 4.8)ã
+æä¸¤ç§æ¹å¼å®è£SINGAã
+
+  * å¦æä½ æ³ä½¿ç¨æè¿çä»£ç ï¼è¯·æ§è¡ä»¥ä¸å½ä»¤ä» 
[Github](https://github.com/apache/incubator-singa.git) ä¸åéï¼
+
+        $ git clone [email protected]:apache/incubator-singa.git
+        $ cd incubator-singa
+        $ ./autogen.sh
+        $ ./configure
+        $ make
+
+  æ³¨æ: ç±äºæä»¬ççå¿½ï¼å¨å å¥Apache 
Incubatoré¡¹ç®åï¼[nusinga](https://github.com/orgs/nusinga) 
å¸å·ä¸çSINGAåºï¼repoï¼å¹¶æ²¡æå 
é¤ï¼ä½å®æ©å·²æ²¡ææ´æ°ï¼å¾æ±æç»å¤§å®¶å¸¦æ¥çä¸ä¾¿ã
+
+  * å¦æä½ ä¸è½½äºåå¸åï¼è¯·æä»¥ä¸å½ä»¤å®è£ï¼
+
+        $ tar xvf singa-xxx
+        $ cd singa-xxx
+        $ ./configure
+        $ make
+
+    
SINGAçé¨åç¹æ§ä¾èµäºå¤é¨åºï¼è¿äºç¹æ§å¯ä»¥ä½¿ç¨`--enable-<feature>`ç¼è¯ã
+    æ¯å¦ï¼æåè·æ¯ælmdbçSINGAï¼å¯ä»¥è¿è¡ä¸é¢çå½ä»¤ï¼
+
+        $ ./configure --enable-lmdb
+
+<!---
+Zhongle: please update the code to use the follow command
+
+    $ make test
+
+After compilation, you will find the binary file singatest. Just run it!
+More details about configure script can be found by running:
+
+               $ ./configure -h
+-->
+
+SINGAç¼è¯æååï¼ *libsinga.so* åå¯æ§è¡æä»¶ *singa* ä¼çæå¨ 
*.libs/* æä»¶å¤¹ä¸ã
+
+å¦æç¼ºå¤±ï¼ææ²¡ææ£æµå°ï¼æäºä¾èµåºï¼å¯ä½¿ç¨ä¸é¢çèæ¬ä¸è½½åå®è£
ï¼
+
+<!---
+to be updated after zhongle changes the code to use
+
+    ./install.sh libname \-\-prefix=
+
+-->
+    # must goto thirdparty folder
+    $ cd thirdparty
+    $ ./install.sh LIB_NAME PREFIX
+
+å¦ææ²¡ææå®å®è£è·¯å¾ï¼è¿äºåºä¼è¢«å®è£
å¨è¿äºè½¯ä»¶é»è®¤çå®è£
è·¯å¾ä¸ãæ¯å¦ï¼å¦ææ³å¨é»è®¤ç³»ç»æä»¶å¤¹ä¸å®è£
`zeromq`ï¼è¯·æ§è¡ä»¥ä¸å½ä»¤ï¼
+
+    $ ./install.sh zeromq
+
+æèï¼å¦ææ³å®è£å°å¶ä»ç®å½ï¼
+
+    $ ./install.sh zeromq PREFIX
+
+ä¹å¯ä»¥å°ææçä¾èµåºå®è£å° */usr/local* ç®å½:
+
+    $ ./install.sh all /usr/local
+
+ä¸è¡¨å±ç¤ºäºåä¾èµåºçç¬¬ä¸ä¸ªåæ°ï¼
+
+    LIB_NAME  LIBRARIE
+    czmq*                 czmq lib
+    glog                  glog lib
+    lmdb                  lmdb lib
+    OpenBLAS              OpenBLAS lib
+    protobuf              Google protobuf
+    zeromq                zeromq lib
+    zookeeper             Apache zookeeper
+
+*: å ä¸º `czmq` ä¾èµäº 
`zeromq`ï¼ä¸è¿°èæ¬å¤æä¾ä¸ä¸ªåæ°ï¼è¯´æ `zeromq` çä½ç½®ã
+`czmq` çå®è£å½ä»¤æ¯ï¼
+
+<!---
+to be updated to
+
+    $./install.sh czmq  \-\-prefix=/usr/local \-\-zeromq=/usr/local/zeromq
+-->
+
+    $./install.sh czmq  /usr/local -f=/usr/local/zeromq
+
+æ§è¡åï¼`czmq` ä¼è¢«å®è£å¨ 
*/usr/local*ï¼ä¸è¿°æåä¸ä¸ªè·¯å¾ææäº zeromq çè·¯å¾ã
+
+### å¸¸è§é®é¢
+* Q1: å³ä½¿å®è£äº OpenBLASï¼ä»éè§ `./configure --> cannot find 
blas_segmm() function` éè¯¯ã
+
+  A1: è¯¥éè¯¯æ¯æç¼è¯å¨æ¾ä¸ç`OpenBLAS`ï¼å¦æä½ å®è£å¨ $PREFIX 
(å¦, /opt/OpenBLAS)ï¼ä½ éè¦å°è·¯å¾å¯¼åºï¼å¦ä¸æç¤º
+
+      $ export LIBRARY_PATH=$PREFIX/lib:$LIBRARY_PATH
+      # e.g.,
+      $ export LIBRARY_PATH=/opt/OpenBLAS/lib:$LIBRARY_PATH
+
+
+* Q2: ç¢°è§éè¯¯`cblas.h no such file or directory exists`ã
+
+  Q2: ä½ éè¦å° cblas.h æå¨æä»¶å¤¹åå«å° CPLUS_INCLUDE_PATH 
ä¸ï¼å¦ï¼
+
+      $ export CPLUS_INCLUDE_PATH=$PREFIX/include:$CPLUS_INCLUDE_PATH
+      # e.g.,
+      $ export CPLUS_INCLUDE_PATH=/opt/OpenBLAS/include:$CPLUS_INCLUDE_PATH
+      # then reconfigure and make SINGA
+      $ ./configure
+      $ make
+
+
+* Q3: ç¼è¯SINGAæ¶ï¼ç¢°è§éè¯¯`SSE2 instruction set not enabled`ã
+
+  A3: ä½ å¯ä»¥å°è¯ä»¥ä¸å½ä»¤:
+
+      $ make CFLAGS='-msse2' CXXFLAGS='-msse2'
+
+
+* Q4: å½æè¯çimport .pyæä»¶æ¶ï¼ä»google.protobuf.internal 
å¾å°éè¯¯`ImportError: cannot import name enum_type_wrapper`ã
+
+  A4: éè¿ `make install` å®è£google protobufå, æä»¬åºè¯¥å®è£
pythonè¿è¡æ¶åºãå¨protobufæºæä»¶å¤¹ä¸è¿è¡ï¼
+
+      $ cd /PROTOBUF/SOURCE/FOLDER
+      $ cd python
+      $ python setup.py build
+      $ python setup.py install
+
+  å¦æä½ è¦å¨ç³»ç»æä»¶å¤¹ä¸å®è£
pythonçè¿è¡æ¶åºï¼å¯è½è¦ç¨`sudo`ã
+
+
+* Q5: éè§ç±gflagså¯¼è´çé¾æ¥éè¯¯ã
+
+  A5: SINGAä¸ä¾èµgflagsï¼ä½ä½ å¯è½å¨å®è£glogæ¶å®è£
äºgflagsãè¿ç§æåµä¸ä½ éè¦ç¨ *thirdparty/install.sh* 
éæ°å°glogå®è£
å°å¦ä¸æä»¶å¤¹ï¼å¹¶å°è¯¥æä»¶å¤¹è·¯å¾å¯¼åºå°LDFLAGS å CPPFLAGS 
ä¸ã
+
+
+* Q6: å¨mac OS Xä¸ç¼è¯SINGAåå®è£ `glog` æ¶ï¼éå°äºè´å½éè¯¯ 
`'ext/slist' file not found`
+
+  A6: è¯·åç¬å®è£`glog`ï¼åå°è¯ä»¥ä¸å½ä»¤:
+
+      $ make CFLAGS='-stdlib=libstdc++' CXXFLAGS='stdlib=libstdc++'
+
+* Q7: å½æå¯å¨ä¸ä¸ªè®ç»ä½ä¸æ¶ï¼ç¨åºæ¥éä¸º "ZOO_ERROR...zk 
retcode=-4..."ã
+
+  A7: è¿æ¯å ä¸º zookeeper æ²¡æå¯å¨ï¼è¯·å¯å¨ zookeeper æå¡ã
+
+      $ ./bin/zk-service start
+
+  å¦æä»æè¿ä¸ªéè¯¯ï¼å¯è½æ¯æ²¡æjavaï¼ä½ 
å¯ä»¥ç¨ä¸è¿°å½ä»¤æ¥ç
+
+      $ java --version
+
+* Q8: å½æä»æºæä»¶å®è£ OpenBLAS æ¶ï¼è¢«åç¥éè¦ä¸ä¸ª fortran 
ç¼è¯å¨ã
+
+  A8: æå¦ä¸å½ä»¤ç¼è¯ OpenBLASï¼
+
+      $ make ONLY_CBLAS=1
+
+  æèç¨apt-getå®è£
+
+           $ sudo apt-get install openblas-dev
+
+  æè
+
+           $ sudo yum install openblas-devel
+
+  åä¸¤ä¸ªå½ä»¤éè¦ root æéï¼æ³¨æOpenBLASå®è£
åè®¾ç½®ç¯å¢åéåå«å¤´æä»¶ååºçè·¯å¾ï¼åç§ ä¾èµ å°èï¼
+
+* Q9: å½æå®è£ protocol buffer æ¶ï¼è¢«åç¥ GLIBC++_3.4.20 not found in 
/usr/lib64/libstdc++.so.6.
+
+  A9: è¿è¯´æé¾æ¥å¨æ¾å°äº 
libstdc++.so.6ï¼ä½æ¯è¿ä¸ªæä»¶æ¯ç¨äºç¼è¯åé¾æ¥ç¨åºçGCCçæ¬èãç¨åºè¦æ±å±äºæ°çæ¬GCCçlibstdc++ï¼æä»¥å¿
é¡»åè¯é¾æ¥å¨æä¹æ¾å°æ°çæ¬çé¢libstdc++å
±äº«åºãæç®åçè§£å³æ¹æ³æ¯æ¾å°æ£ç¡®ç 
libstdc++ï¼å¹¶æå®å¯¼åºå° LD_LIBRARY_PATH ä¸ãå¦, å¦æGLIBC++_3.4.20 
è¢«ä¸é¢çå½ä»¤ååºï¼
+
+      $ strings /usr/local/lib64/libstdc++.so.6|grep GLIBC++
+
+  ä½ åªéè¿æ ·è®¾ç½®ä½ çç¯å¢åéï¼
+
+      $ export LD_LIBRARY_PATH=/usr/local/lib64:$LD_LIBRARY_PATH
+
+* Q10: 
å½æå¨ç¼è¯glogæ¶ï¼æç¤ºå¦ä¸éè¯¯"src/logging_unittest.cc:83:20: 
error: âgflagsâ is not a namespace-name"
+
+  A10: å¯è½æ¯ä½ å·²ç»å®è£çgflagsçæ¬ï¼å
¶å½åç©ºé´ä¸æ¯gflagsï¼èæ¯å¶ä»çï¼æ¯å¦æ¯'google'ãå 
æ¤glogä¸è½æ¾å° 'gflags' å½åç©ºé´ã
+  
+  ç¼è¯glogä¸éè¦gflagsï¼ä½ å¯ä»¥ä¿®æ¹ configure.ac æä»¶ï¼å¿½ç¥ 
gflagsã
+
+  1. cd to glog src directory
+  2. ä¿®æ¹ configure.ac ç¬¬125è¡ï¼æ¹ä¸º "AC_CHECK_LIB(gflags, main, 
ac_cv_have_libgflags=0, ac_cv_have_libgflags=0)"
+  3. autoreconf 
+ 
+  ç¶åï¼è¯·éæ°ç¼è¯glogã

Added: incubator/singa/site/trunk/content/markdown/v0.3.0/zh/mlp.md
URL: 
http://svn.apache.org/viewvc/incubator/singa/site/trunk/content/markdown/v0.3.0/zh/mlp.md?rev=1740048&view=auto
==============================================================================
--- incubator/singa/site/trunk/content/markdown/v0.3.0/zh/mlp.md (added)
+++ incubator/singa/site/trunk/content/markdown/v0.3.0/zh/mlp.md Wed Apr 20 
05:09:06 2016
@@ -0,0 +1,215 @@
+# MLP Example
+
+---
+
+Multilayer perceptron (MLP) is a subclass of feed-forward neural networks.
+A MLP typically consists of multiple directly connected layers, with each 
layer fully
+connected to the next one. In this example, we will use SINGA to train a
+[simple MLP model proposed by Ciresan](http://arxiv.org/abs/1003.0358)
+for classifying handwritten digits from the [MNIST 
dataset](http://yann.lecun.com/exdb/mnist/).
+
+## Running instructions
+
+Please refer to the [installation](installation.html) page for
+instructions on building SINGA, and the [quick start](quick-start.html)
+for instructions on starting zookeeper.
+
+We have provided scripts for preparing the training and test dataset in 
*examples/cifar10/*.
+
+    # in examples/mnist
+    $ cp Makefile.example Makefile
+    $ make download
+    $ make create
+
+### Training on CPU
+
+After the datasets are prepared, we start the training by
+
+    ./bin/singa-run.sh -conf examples/mnist/job.conf
+
+After it is started, you should see output like
+
+    Record job information to /tmp/singa-log/job-info/job-1-20150817-055231
+    Executing : ./singa -conf /xxx/incubator-singa/examples/mnist/job.conf 
-singa_conf /xxx/incubator-singa/conf/singa.conf -singa_job 1
+    E0817 07:15:09.211885 34073 cluster.cc:51] proc #0 -> 192.168.5.128:49152 
(pid = 34073)
+    E0817 07:15:14.972231 34114 server.cc:36] Server (group = 0, id = 0) start
+    E0817 07:15:14.972520 34115 worker.cc:134] Worker (group = 0, id = 0) start
+    E0817 07:15:24.462602 34073 trainer.cc:373] Test step-0, loss : 2.341021, 
accuracy : 0.109100
+    E0817 07:15:47.341076 34073 trainer.cc:373] Train step-0, loss : 2.357269, 
accuracy : 0.099000
+    E0817 07:16:07.173364 34073 trainer.cc:373] Train step-10, loss : 
2.222740, accuracy : 0.201800
+    E0817 07:16:26.714855 34073 trainer.cc:373] Train step-20, loss : 
2.091030, accuracy : 0.327200
+    E0817 07:16:46.590946 34073 trainer.cc:373] Train step-30, loss : 
1.969412, accuracy : 0.442100
+    E0817 07:17:06.207080 34073 trainer.cc:373] Train step-40, loss : 
1.865466, accuracy : 0.514800
+    E0817 07:17:25.890033 34073 trainer.cc:373] Train step-50, loss : 
1.773849, accuracy : 0.569100
+    E0817 07:17:51.208935 34073 trainer.cc:373] Test step-60, loss : 1.613709, 
accuracy : 0.662100
+    E0817 07:17:53.176766 34073 trainer.cc:373] Train step-60, loss : 
1.659150, accuracy : 0.652600
+    E0817 07:18:12.783370 34073 trainer.cc:373] Train step-70, loss : 
1.574024, accuracy : 0.666000
+    E0817 07:18:32.904942 34073 trainer.cc:373] Train step-80, loss : 
1.529380, accuracy : 0.670500
+    E0817 07:18:52.608111 34073 trainer.cc:373] Train step-90, loss : 
1.443911, accuracy : 0.703500
+    E0817 07:19:12.168465 34073 trainer.cc:373] Train step-100, loss : 
1.387759, accuracy : 0.721000
+    E0817 07:19:31.855865 34073 trainer.cc:373] Train step-110, loss : 
1.335246, accuracy : 0.736500
+    E0817 07:19:57.327133 34073 trainer.cc:373] Test step-120, loss : 
1.216652, accuracy : 0.769900
+
+After the training of some steps (depends on the setting) or the job is
+finished, SINGA will [checkpoint](checkpoint.html) the model parameters.
+
+### Training on GPU
+
+To train this example model on GPU, just add a field in the configuration file 
for
+the GPU device,
+
+    # job.conf
+    gpu: 0
+
+### Training using Python script
+
+The python helpers come with SINGA 0.2 make it easy to configure the job. For 
example
+the job.conf is replaced with a simple python script mnist_mlp.py
+which has about 30 lines of code following the [Keras API](http://keras.io/).
+
+    ./bin/singa-run.sh -exec tool/python/examples/mnist_mlp.py
+
+
+
+## Details
+
+To train a model in SINGA, you need to prepare the datasets,
+and a job configuration which specifies the neural net structure, training
+algorithm (BP or CD), SGD update algorithm (e.g. Adagrad),
+number of training/test steps, etc.
+
+### Data preparation
+
+Before using SINGA, you need to write a program to pre-process the dataset you
+use to a format that SINGA can read. Please refer to the
+[Data Preparation](data.html) to get details about preparing
+this MNIST dataset.
+
+
+### Neural net
+
+<div style = "text-align: center">
+<img src = "../images/example-mlp.png" style = "width: 230px">
+<br/><strong>Figure 1 - Net structure of the MLP example. </strong></img>
+</div>
+
+
+Figure 1 shows the structure of the simple MLP model, which is constructed 
following
+[Ciresan's paper](http://arxiv.org/abs/1003.0358). The dashed circle contains
+two layers which represent one feature transformation stage. There are 6 such
+stages in total. They sizes of the 
[InnerProductLayer](layer.html#innerproductlayer)s in these circles decrease 
from
+2500->2000->1500->1000->500->10.
+
+Next we follow the guide in [neural net page](neural-net.html)
+and [layer page](layer.html) to write the neural net configuration.
+
+* We configure an input layer to read the training/testing records from a disk 
file.
+
+        layer {
+            name: "data"
+            type: kRecordInput
+            store_conf {
+              backend: "kvfile"
+              path: "examples/mnist/train_data.bin"
+              random_skip: 5000
+              batchsize: 64
+              shape: 784
+              std_value: 127.5
+              mean_value: 127.5
+             }
+             exclude: kTest
+          }
+
+        layer {
+            name: "data"
+            type: kRecordInput
+            store_conf {
+              backend: "kvfile"
+              path: "examples/mnist/test_data.bin"
+              batchsize: 100
+              shape: 784
+              std_value: 127.5
+              mean_value: 127.5
+             }
+             exclude: kTrain
+          }
+
+
+* All [InnerProductLayer](layer.html#innerproductlayer)s are configured 
similarly as,
+
+        layer{
+          name: "fc1"
+          type: kInnerProduct
+          srclayers:"data"
+          innerproduct_conf{
+            num_output: 2500
+          }
+          param{
+            name: "w1"
+            ...
+          }
+          param{
+            name: "b1"
+            ..
+          }
+        }
+
+    with the `num_output` decreasing from 2500 to 10.
+
+* A [STanhLayer](layer.html#stanhlayer) is connected to every InnerProductLayer
+except the last one. It transforms the feature via scaled tanh function.
+
+        layer{
+          name: "tanh1"
+          type: kSTanh
+          srclayers:"fc1"
+        }
+
+* The final [Softmax loss layer](layer.html#softmaxloss) connects
+to LabelLayer and the last STanhLayer.
+
+        layer{
+          name: "loss"
+          type:kSoftmaxLoss
+          softmaxloss_conf{ topk:1 }
+          srclayers:"fc6"
+          srclayers:"data"
+        }
+
+### Updater
+
+The [normal SGD updater](updater.html#updater) is selected.
+The learning rate shrinks by 0.997 every 60 steps (i.e., one epoch).
+
+    updater{
+      type: kSGD
+      learning_rate{
+        base_lr: 0.001
+        type : kStep
+        step_conf{
+          change_freq: 60
+          gamma: 0.997
+        }
+      }
+    }
+
+### TrainOneBatch algorithm
+
+The MLP model is a feed-forward model, hence
+[Back-propagation algorithm](train-one-batch#back-propagation)
+is selected.
+
+    train_one_batch {
+      alg: kBP
+    }
+
+### Cluster setting
+
+The following configuration set a single worker and server for training.
+[Training frameworks](frameworks.html) page introduces configurations of a 
couple of distributed
+training frameworks.
+
+    cluster {
+      nworker_groups: 1
+      nserver_groups: 1
+    }

svn commit: r1740048 [9/10] - in /incubator/singa/site/trunk/content/markdown: ./ develop/ docs/ docs/kr/ v0.3.0/ v0.3.0/jp/ v0.3.0/kr/ v0.3.0/zh/

Reply via email to