Modified: incubator/singa/site/trunk/content/markdown/docs/rnn.md
URL: 
http://svn.apache.org/viewvc/incubator/singa/site/trunk/content/markdown/docs/rnn.md?rev=1703880&r1=1703879&r2=1703880&view=diff
==============================================================================
--- incubator/singa/site/trunk/content/markdown/docs/rnn.md (original)
+++ incubator/singa/site/trunk/content/markdown/docs/rnn.md Fri Sep 18 15:10:58 
2015
@@ -1,157 +1,148 @@
-# RNN Example
+Recurrent Neural Networks for Language Modelling
 
+---
 
-Recurrent Neural Networks (RNN) are widely used for modeling sequential data,
-such as music, videos and sentences.  In this example, we use SINGA to train a
+Recurrent Neural Networks (RNN) are widely used for modelling sequential data,
+such as music and sentences.  In this example, we use SINGA to train a
 [RNN 
model](http://www.fit.vutbr.cz/research/groups/speech/publi/2010/mikolov_interspeech2010_IS100722.pdf)
 proposed by Tomas Mikolov for [language 
modeling](https://en.wikipedia.org/wiki/Language_model).
 The training objective (loss) is
-minimize the [perplexity per word](https://en.wikipedia.org/wiki/Perplexity), 
which
+to minimize the [perplexity per 
word](https://en.wikipedia.org/wiki/Perplexity), which
 is equivalent to maximize the probability of predicting the next word given 
the current word in
 a sentence.
 
-Different to the [CNN](http://singa.incubator.apache.org/docs/cnn), 
[MLP](http://singa.incubator.apache.org/docs/mlp)
-and [RBM](http://singa.incubator.apache.org/docs/rbm) examples which use 
built-in
-[Layer](http://singa.incubator.apache.org/docs/layer)s and 
[Record](http://singa.incubator.apache.org/docs/data)s,
-none of the layers in this model is built-in. Hence users can get examples of
-implementing their own Layers and data Records in this page.
+Different to the [CNN](cnn.html), [MLP](mlp.html)
+and [RBM](rbm.html) examples which use built-in
+layers(layer) and records(data),
+none of the layers in this example are built-in. Hence users would learn to
+implement their own layers and data records through this example.
 
 ## Running instructions
 
-In *SINGA_ROOT/examples/rnn/*, scripts are provided to run the training job.
+In *SINGA_ROOT/examples/rnnlm/*, scripts are provided to run the training job.
 First, the data is prepared by
 
     $ cp Makefile.example Makefile
     $ make download
     $ make create
 
-Second, the training is started by passing the job configuration as,
+Second, to compile the source code under *examples/rnnlm/*, run
 
-    # in SINGA_ROOT
-    $ ./bin/singa-run.sh -conf SINGA_ROOT/examples/rnn/job.conf
+    $ make rnnlm
 
+An executable file *rnnlm.bin* will be generated.
 
+Third, the training is started by passing *rnnlm.bin* and the job configuration
+to *singa-run.sh*,
+
+    # at SINGA_ROOT/
+    # export LD_LIBRARY_PATH=.libs:$LD_LIBRARY_PATH
+    $ ./bin/singa-run.sh -exec examples/rnnlm/rnnlm.bin -conf 
examples/rnnlm/job.conf
 
 ## Implementations
 
-<img src="http://singa.incubator.apache.org/images/rnn-refine.png"; 
align="center" width="300px"/>
+<img src="../images/rnnlm.png" align="center" width="400px"/>
 <span><strong>Figure 1 - Net structure of the RNN model.</strong></span>
 
-The neural net structure is shown Figure 1.
-Word records are loaded by `RnnlmDataLayer` from `WordShard`. 
`RnnlmWordparserLayer`
-parses word records to get word indexes (in the vocabulary). For every 
iteration,
-`window_size` words are processed. `RnnlmWordinputLayer` looks up a word
-embedding matrix to extract feature vectors for words in the window.
-These features are transformed by `RnnlmInnerproductLayer` layer and 
`RnnlmSigmoidLayer`.
-`RnnlmSigmoidLayer` is a recurrent layer that forwards features from previous 
words
-to next words.  Finally, `RnnlmComputationLayer` computes the perplexity loss 
with
-word class information from `RnnlmClassparserLayer`. The word class is a 
cluster ID.
-Words are clustered based on their frequency in the dataset, e.g., frequent 
words
-are clustered together and less frequent words are clustered together. 
Clustering
-is to improve the efficiency of the final prediction process.
+The neural net structure is shown Figure 1.  Word records are loaded by
+`DataLayer`. For every iteration, at most `max_window` word records are
+processed. If a sentence ending character is read, the `DataLayer` stops
+loading immediately. `EmbeddingLayer` looks up a word embedding matrix to 
extract
+feature vectors for words loaded by the `DataLayer`.  These features are 
transformed by the
+`HiddenLayer` which propagates the features from left to right. The
+output feature for word at position k is influenced by words from position 0 to
+k-1.  Finally, `LossLayer` computes the cross-entropy loss (see below)
+by predicting the next word of each word.
+`LabelLayer` reads the same number of word records as the embedding layer but 
starts from
+position 1. Consequently, the word record at position k in `LabelLayer` is the 
ground
+truth for the word at position k in `LossLayer`.
+
+The cross-entropy loss is computed as
+
+`$$L(w_t)=-log P(w_{t+1}|w_t)$$`
+
+Given `$w_t$` the above equation would compute over all words in the 
vocabulary,
+which is time consuming.
+[RNNLM 
Toolkit](https://f25ea9ccb7d3346ce6891573d543960492b92c30.googledrive.com/host/0ByxdPXuxLPS5RFM5dVNvWVhTd0U/rnnlm-0.4b.tgz)
+accelerates the computation as
+
+`$$P(w_{t+1}|w_t) = P(C_{w_{t+1}}|w_t) * P(w_{t+1}|C_{w_{t+1}})$$`
+
+Words from the vocabulary are partitioned into a user-defined number of 
classes.
+The first term on the left side predicts the class of the next word, and
+then predicts the next word given its class. Both the number of classes and
+the words from one class are much smaller than the vocabulary size. The 
probabilities
+can be calculated much faster.
+
+The perplexity per word is computed by,
+
+`$$PPL = 10^{- avg_t log_{10} P(w_{t+1}|w_t)}$$`
 
 ### Data preparation
 
-We use a small dataset in this example. In this dataset, [dataset description, 
e.g., format].
+We use a small dataset provided by the [RNNLM 
Toolkit](https://f25ea9ccb7d3346ce6891573d543960492b92c30.googledrive.com/host/0ByxdPXuxLPS5RFM5dVNvWVhTd0U/rnnlm-0.4b.tgz).
+It has 10,000 training sentences, with 71350 words in total and 3720 unique 
words.
 The subsequent steps follow the instructions in
-[Data Preparation](http://singa.incubator.apache.org/docs/data) to convert the
-raw data into `Record`s and insert them into `DataShard`s.
+[Data Preparation](data.html) to convert the
+raw data into records and insert them into `DataShard`s.
 
 #### Download source data
 
-    # in SINGA_ROOT/examples/rnn/
-    wget http://www.fit.vutbr.cz/~imikolov/rnnlm/simple-examples.tgz
-    xxx
-
+    # in SINGA_ROOT/examples/rnnlm/
+    cp Makefile.example Makefile
+    make download
 
-#### Define your own record.
+#### Define your own record
 
-Since this dataset has different format as the built-in 
`SingleLabelImageRecord`,
-we need to extend the base `Record` to add new fields,
+We define the word record as follows,
 
-    # in SINGA_ROOT/examples/rnn/user.proto
-    package singa;
-
-    import "common.proto";  // import SINGA Record
-
-    extend Record {  // extend base Record to include users' records
-        optional WordClassRecord wordclass = 101;
-        optional SingleWordRecord singleword = 102;
+    # in SINGA_ROOT/examples/rnnlm/rnnlm.proto
+    message WordRecord {
+      optional string word = 1;
+      optional int32 word_index = 2;
+      optional int32 class_index = 3;
+      optional int32 class_start = 4;
+      optional int32 class_end = 5;
     }
 
-    message WordClassRecord {
-        optional int32 class_index = 1; // the index of this class
-        optional int32 start = 2; // the index of the start word in this class;
-        optional int32 end = 3; // the index of the end word in this class
+    extend singa.Record {
+      optional WordRecord word = 101;
     }
 
-    message SingleWordRecord {
-        optional string word = 1;
-        optional int32 word_index = 2;   // the index of this word in the 
vocabulary
-        optional int32 class_index = 3;   // the index of the class 
corresponding to this word
-    }
-
-
-#### Create data shard for training and testing
-
-{% comment %}
-As the vocabulary size is very large, the original perplexity calculation 
method
-is time consuming. Because it has to calculate the probabilities of all 
possible
-words for
-
-    p(wt|w0, w1, ... wt-1).
-
-
-Tomas proposed to divide all
-words into different classes according to the word frequency, and compute the
-perplexity according to
+It includes the word string and its index in the vocabulary.
+Words in the vocabulary are sorted based on their frequency in the training 
dataset.
+The sorted list is cut into 100 sublists such that each sublist has 1/100 total
+word frequency. Each sublist is called a class.
+Hence each word has a `class_index` ([0,100)). The `class_start` is the index
+of the first word in the same class as `word`. The `class_end` is the index of
+the first word in the next class.
 
-    p(wt|w0, w1, ... wt-1) = p(c|w0,w1,..wt-1) p(w|c)
+#### Create DataShards
 
-where `c` is the word class, `w0, w1...wt-1` are the previous words before 
`wt`.
-The probabilities on the right side can be computed faster than
+We use code from RNNLM Toolkit to read words, and sort them into classes.
+The main function in *create_shard.cc* first creates word classes based on the 
training
+dataset. Second it calls the following function to create data shards for the
+training, validation and test dataset.
 
+    int create_shard(const char *input_file, const char *output_file);
 
-[Makefile](https://github.com/kaiping/incubator-singa/blob/rnnlm/examples/rnnlm/Makefile)
-for creating the shards (see in
-  
[create_shard.cc](https://github.com/kaiping/incubator-singa/blob/rnnlm/examples/rnnlm/create_shard.cc)),
-  we need to specify where to download the source data, number of classes we
-  want to divide all occurring words into, and all the shards together with
-  their names, directories we want to create.
-{% endcomment %}
-
-*SINGA_ROOT/examples/rnn/create_shard.cc* defines the following function for 
creating data shards,
-
-    void create_shard(const char *input, int nclass) {
-
-`input` is the path to [the text file], `nclass` is user specified cluster 
size.
+`input` is the path to training/validation/testing text file from the RNNLM 
Toolkit, `output` is output shard folder.
 This function starts with
 
-      using StrIntMap = std::map<std::string, int>;
-      StrIntMap *wordIdxMapPtr;        //      Mapping word string to a word 
index
-      StrIntMap *wordClassIdxMapPtr;   //      Mapping word string to a word 
class index
-      if (-1 == nclass) {
-          loadClusterForNonTrainMode(input, nclass, &wordIdxMap, 
&wordClassIdxMap); // non-training phase
-      } else {
-          doClusterForTrainMode(input, nclass, &wordIdxMap, &wordClassIdxMap); 
// training phase
-      }
+    DataShard dataShard(output, DataShard::kCreate);
 
+Then it reads the words one by one. For each word it creates a `WordRecord` 
instance,
+and inserts it into the `dataShard`.
 
-  * If `-1 == nclass`, `path` points to the training data file.  
`doClusterForTrainMode`
-  reads all the words in the file to create the two maps. [The two maps are 
stored in xxx]
-  * otherwise, `path` points to either test or validation data file. 
`loadClusterForNonTrainMode`
-  loads the two maps from [xxx].
-
-Words from training/text/validation files are converted into `Record`s by
-
-      singa::SingleWordRecord *wordRecord = 
record.MutableExtension(singa::singleword);
-      while (in >> word) {
-        wordRecord->set_word(word);
-        wordRecord->set_word_index(wordIdxMap[word]);
-        wordRecord->set_class_index(wordClassIdxMap[word]);
-        snprintf(key, kMaxKeyLength, "%08d", wordIdxMap[word]);
-        wordShard.Insert(std::string(key), record);
-      }
+    int wcnt = 0; // word count
+    singa.Record record;
+    WordRecord* wordRecord = record.MutableExtension(word);
+    while(1) {
+      readWord(wordstr, fin);
+      if (feof(fin)) break;
+      ...// fill in the wordRecord;
+      int length = snprintf(key, BUFFER_LEN, "%05d", wcnt++);
+      dataShard.Insert(string(key, length), record);
     }
 
 Compilation and running commands are provided in the *Makefile.example*.
@@ -159,403 +150,299 @@ After executing
 
     make create
 
-, three data shards will created using the `create_shard.cc`, namely,
-*rnnlm_word_shard_train*, *rnnlm_word_shard_test* and *rnnlm_word_shard_valid*.
+, three data shards will created, namely,
+*train_shard*, *test_shard* and *valid_shard*.
 
 
 ### Layer implementation
 
-7 layers (i.e., Layer subclasses) are implemented for this application,
-including 1 [data 
layer](http://singa.incubator.apache.org/docs/layer#data-layers) which fetches 
data records from data
-shards, 2 [parser 
layers](http://singa.incubator.apache.org/docs/layer#parser-layers) which 
parses the input records, 3 neuron layers
-which transforms the word features and 1 loss layer which computes the
-objective loss.
-
-First, we illustrate the data shard and how to create it for this application. 
Then, we
-discuss the configuration and functionality of layers. Finally, we introduce 
how
-to configure a job and then run the training for your own model.
-
-Following the guide for implementing [new Layer 
subclasses](http://singa.incubator.apache.org/docs/layer#implementing-a-new-layer-subclass),
-we extend the 
[LayerProto](http://singa.incubator.apache.org/api/classsinga_1_1LayerProto.html)
-to include the configuration message of each user-defined layer as shown below
-(5 out of the 7 layers have specific configurations),
+6 user-defined layers are implemented for this application.
+Following the guide for implementing [new Layer 
subclasses](layer#implementing-a-new-layer-subclass),
+we extend the [LayerProto](../api/classsinga_1_1LayerProto.html)
+to include the configuration messages of user-defined layers as shown below
+(3 out of the 7 layers have specific configurations),
 
-    package singa;
 
-    import "common.proto";  // Record message for SINGA is defined
     import "job.proto";     // Layer message for SINGA is defined
 
     //For implementation of RNNLM application
-    extend LayerProto {
-        optional RnnlmComputationProto rnnlmcomputation_conf = 201;
-        optional RnnlmSigmoidProto rnnlmsigmoid_conf = 202;
-        optional RnnlmInnerproductProto rnnlminnerproduct_conf = 203;
-        optional RnnlmWordinputProto rnnlmwordinput_conf = 204;
-        optional RnnlmDataProto rnnlmdata_conf = 207;
-    }
-
-
-In the subsequent sections, we describe the implementation of each layer, 
including
-it configuration message.
-
-### RnnlmDataLayer
-
-It inherits [DataLayer](/api/classsinga_1_1DataLayer.html) for loading word and
-class `Record`s from `DataShard`s into memory.
-
-#### Functionality
-
-    void RnnlmDataLayer::Setup() {
-      read records from ClassShard to construct mapping from word string to 
class index
-      Resize length of records_ as window_size + 1
-      Read 1st word record to the last position
-    }
-
-
-    void RnnlmDataLayer::ComputeFeature() {
-           records_[0] = records_[windowsize_];        //Copy the last record 
to 1st position in the record vector
-      Assign values to records_;       //Read window_size new word records 
from WordShard
-    }
-
-
-The `Steup` function load the mapping (from word string to class index) from
-*ClassShard*.
-
-Every time the `ComputeFeature` function is called, it loads `windowsize_` 
records
-from `WordShard`.
-
-
-[For the consistency
-of operations at each training iteration, it maintains a record vector (length
-of window_size + 1). It reads the 1st record from the WordShard and puts it in
-the last position of record vector].
-
-
-#### Configuration
-
-    message RnnlmDataProto {
-        required string class_path = 1;   // path to the class data 
file/folder, absolute or relative to the workspace
-        required string word_path = 2;    // path to the word data 
file/folder, absolute or relative to the workspace
-        required int32 window_size = 3;   // window size.
-    }
-
-[class_path to file or folder?]
-
-[There two paths, `class_path` for ...; `word_path` for..
-The `window_size` is set to ...]
-
-
-### RnnlmWordParserLayer
-
-This layer gets `window_size` word strings from the `RnnlmDataLayer` and looks
-up the word string to word index map to get word indexes.
-
-#### Functionality
-
-    void RnnlmWordparserLayer::Setup(){
-        Obtain window size from src layer;
-        Obtain vocabulary size from src layer;
-        Reshape data_ as {window_size};
-    }
-
-    void RnnlmWordparserLayer::ParseRecords(Blob* blob){
-      for each word record in the window, get its word index and insert the 
index into blob
-    }
-
-
-#### Configuration
-
-This layer does not have specific configuration fields.
-
-
-### RnnlmClassParserLayer
-
-It maps each word in the processing window into a class index.
-
-#### Functionality
-
-    void RnnlmClassparserLayer::Setup(){
-      Obtain window size from src layer;
-      Obtain vocaubulary size from src layer;
-      Obtain class size from src layer;
-      Reshape data_ as {windowsize_, 4};
-    }
-
-    void RnnlmClassparserLayer::ParseRecords(){
-      for(int i = 1; i < records.size(); i++){
-          Copy starting word index in this class to data[i]'s 1st position;
-          Copy ending word index in this class to data[i]'s 2nd position;
-          Copy index of input word to data[i]'s 3rd position;
-          Copy class index of input word to data[i]'s 4th position;
+    extend singa.LayerProto {
+      optional EmbeddingProto embedding_conf = 101;
+      optional LossProto loss_conf = 102;
+      optional InputProto input_conf = 103;
+    }
+
+In the subsequent sections, we describe the implementation of each layer,
+including its configuration message.
+
+#### RNNLayer
+
+This is the base layer of all other layers for this applications. It is defined
+as follows,
+
+    class RNNLayer : virtual public Layer {
+    public:
+      inline int window() { return window_; }
+    protected:
+      int window_;
+    };
+
+For this application, two iterations may process different number of words.
+Because sentences have different lengths.
+The `DataLayer` decides the effective window size. All other layers call its 
source layers to get the
+effective window size and resets `window_` in `ComputeFeature` function.
+
+#### DataLayer
+
+DataLayer is for loading Records.
+
+    class DataLayer : public RNNLayer, singa::DataLayer {
+     public:
+      void Setup(const LayerProto& proto, int npartitions) override;
+      void ComputeFeature(int flag, Metric *perf) override;
+      int max_window() const {
+        return max_window_;
       }
-    }
-
-The setup function read
-
-
-#### Configuration
-This layer fetches the class information (the mapping information between
-classes and words) from RnnlmDataLayer and maintains this information as data
-in this layer.
-
-
-
-Next, this layer parses the last "window_size" number of word records from
-RnnlmDataLayer and stores them as data. Then, it retrieves the corresponding
-class for each input word. It stores the starting word index of this class,
-ending word index of this class, word index and class index respectively.
+     private:
+      int max_window_;
+      singa::DataShard* shard_;
+    };
 
+The Setup function gets the user configured max window size. Since this 
application
+predicts the next word for each input word, the record vector is resized to
+have max_window+1 records, where the k-th record is loaded as the ground
+truth label for the (k-1)-th record.
 
-### RnnlmWordInputLayer
+    max_window_ = proto.GetExtension(input_conf).max_window();
+    records_.resize(max_window_ + 1);
 
-Using the input word records, this layer obtains corresponding word vectors as
-its data. Then, it passes the data to RnnlmInnerProductLayer above for further
-processing.
+The `ComputeFeature` function loads at most max_window records. It could also
+stop when the sentence ending character is encountered.
 
-#### Configuration
-In this layer, the length of each word vector needs to be configured. Besides,
-whether to use bias term during the training process should also be configured
-(See more in
-[job.proto](https://github.com/kaiping/incubator-singa/blob/rnnlm/src/proto/job.proto)).
-
-    message RnnlmWordinputProto {
-        required int32 word_length = 1;  // vector length for each input word
-        optional bool bias_term = 30 [default = true];  // use bias vector or 
not
+    records_[0] = records_[window_]; // shift the last record to the first
+    window_ = max_window_;
+    for (int i = 1; i <= max_window_; i++) {
+      // load record; break if it is the ending character
     }
 
-#### Functionality
-In setup phase, this layer first reshapes its members such as "data", "grad",
-and "weight" matrix. Then, it obtains the vocabulary size from its source layer
-(i.e., RnnlmWordParserLayer).
-
-In the forward phase, using the "window_size" number of input word indices, the
-"window_size" number of word vectors are selected from this layer's weight
-matrix, each word index corresponding to one row.
+The configuration of `DataLayer` is like
 
-    void RnnlmWordinputLayer::ComputeFeature() {
-        for(int t = 0; t < windowsize_; t++){
-            data[t] = weight[src[t]];
-        }
+    name: "data"
+    user_type: "kData"
+    [input_conf] {
+      path: "examples/rnnlm/train_shard"
+      max_window: 10
     }
 
-In the backward phase, after computing this layer's gradient in its destination
-layer (i.e., RnnlmInnerProductLayer), here the gradient of the weight matrix in
-this layer is copied (by row corresponding to word indices) from this layer's
-gradient.
-
-    void RnnlmWordinputLayer::ComputeGradient() {
-        for(int t = 0; t < windowsize_; t++){
-            gweight[src[t]] = grad[t];
-        }
-    }
-
-
-### RnnlmInnerProductLayer
+#### EmbeddingLayer
 
-This is a neuron layer which receives the data from RnnlmWordInputLayer and
-sends the computation results to RnnlmSigmoidLayer.
+This layer gets records from `DataLayer`. For each record, the word index is
+parsed and used to get the corresponding word feature vector from the embedding
+matrix.
 
-#### Configuration
-In this layer, the number of neurons needs to be specified. Besides, whether to
-use a bias term should also be configured.
+The class is declared as follows,
 
-    message RnnlmInnerproductProto {
-        required int32 num_output = 1; //Number of outputs for the layer
-        optional bool bias_term = 30 [default = true]; //Use bias vector or not
+    class EmbeddingLayer : public RNNLayer {
+      ...
+      const std::vector<Param*> GetParams() const override {
+        std::vector<Param*> params{embed_};
+        return params;
+      }
+     private:
+      int word_dim_, vocab_size_;
+      Param* embed_;
+    }
+
+The `embed_` field is a matrix whose values are parameter to be learned.
+The matrix size is `vocab_size_` x `word_dim_`.
+
+The Setup function reads configurations for `word_dim_` and `vocab_size_`. Then
+it allocates feature Blob for `max_window` words and setups `embed_`.
+
+    int max_window = srclayers_[0]->data(this).shape()[0];
+    word_dim_ = proto.GetExtension(embedding_conf).word_dim();
+    data_.Reshape(vector<int>{max_window, word_dim_});
+    ...
+    embed_->Setup(vector<int>{vocab_size_, word_dim_});
+
+The `ComputeFeature` function simply copies the feature vector from the 
`embed_`
+matrix into the feature Blob.
+
+    # reset effective window size
+    window_ = datalayer->window();
+    auto records = datalayer->records();
+    ...
+    for (int t = 0; t < window_; t++) {
+      int idx = static_cast<int>(records[t].GetExtension(word).word_index());
+      Copy(words[t], embed[idx]);
+    }
+
+The `ComputeGradient` function copies back the gradients to the `embed_` 
matrix.
+
+The configuration for `EmbeddingLayer` is like,
+
+    user_type: "kEmbedding"
+    [embedding_conf] {
+      word_dim: 15
+      vocab_size: 3720
+    }
+    srclayers: "data"
+    param {
+      name: "w1"
+      init {
+        type: kUniform
+        low:-0.3
+        high:0.3
+      }
     }
 
-#### Functionality
-
-In the forward phase, this layer is in charge of executing the dot
-multiplication between its weight matrix and the data in its source layer
-(i.e., RnnlmWordInputLayer).
+#### LabelLayer
 
-    void RnnlmInnerproductLayer::ComputeFeature() {
-        data = dot(src, weight);       //Dot multiplication operation
-    }
+Since the label of records[i] is records[i+1].
+This layer fetches the effective window records starting from position 1.
+It converts each record into a tuple (word_class_start, word_class_end, 
word_index, class_index).
 
-In the backward phase, this layer needs to first compute the gradient of its
-source layer (i.e., RnnlmWordInputLayer). Then, it needs to compute the
-gradient of its weight matrix by aggregating computation results for each
-timestamp. The details can be seen as follows.
 
-    void RnnlmInnerproductLayer::ComputeGradient() {
-        for (int t = 0; t < windowsize_; t++) {
-            Add the dot product of src[t] and grad[t] to gweight;
-        }
-        Copy the dot product of grad and weight to gsrc;
+    for (int i = 0; i < window_; i++) {
+      WordRecord wordrecord = records[i + 1].GetExtension(word);
+      label[4 * i + 0] = wordrecord.class_start();
+      label[4 * i + 1] = wordrecord.class_end();
+      label[4 * i + 2] = wordrecord.word_index();
+      label[4 * i + 3] = wordrecord.class_index();
     }
 
-### RnnlmSigmoidLayer
+There is no special configuration for this layer.
 
-This is a neuron layer for computation. During the computation in this layer,
-each component of the member data specific to one timestamp uses its previous
-timestamp's data component as part of the input. This is how the time-order
-information is utilized in this language model application.
+#### HiddenLayer
 
-Besides, if you want to implement a recurrent neural network following our
-design, this layer is of vital importance for you to refer to. Also, you can
-always think of other design methods to make use of information from past
-timestamps.
+This layer unrolls the recurrent connections for at most max_window times.
+The feature for position k is computed based on the feature from the embedding 
layer (position k)
+and the feature at position k-1 of this layer. The formula is
 
-#### Configuration
+`$$f[k]=\sigma (f[t-1]*W+src[t])$$`
 
-In this layer, whether to use a bias term needs to be specified.
+where `$W$` is a matrix with `word_dim_` x `word_dim_` parameters.
 
-    message RnnlmSigmoidProto {
-        optional bool bias_term = 1 [default = true];  // use bias vector or 
not
-    }
+If you want to implement a recurrent neural network following our
+design, this layer is of vital importance for you to refer to.
 
-#### Functionality
-
-In the forward phase, this layer first receives data from its source layer
-(i.e., RnnlmInnerProductLayer) which is used as one part input for computation.
-Then, for each timestampe this layer executes a dot multiplication between its
-previous timestamp information and its own weight matrix. The results are the
-other part for computation. This layer sums these two parts together and
-executes an activation operation. The detailed descriptions for this process
-are illustrated as follows.
-
-    void RnnlmSigmoidLayer::ComputeFeature() {
-        for(int t = 0; t < window_size; t++){
-            if(t == 0) Copy the sigmoid results of src[t] to data[t];
-            else Compute the dot product of data[t - 1] and weight, and add 
sigmoid results of src[t] to be data[t];
-       }
+    class HiddenLayer : public RNNLayer {
+      ...
+      const std::vector<Param*> GetParams() const override {
+        std::vector<Param*> params{weight_};
+        return params;
+      }
+    private:
+      Param* weight_;
+    };
+
+The `Setup` function setups the weight matrix as
+
+    weight_->Setup(std::vector<int>{word_dim, word_dim});
+
+The `ComputeFeature` function gets the effective window size (`window_`) from 
its source layer
+i.e., the embedding layer. Then it propagates the feature from position 0 to 
position
+`window_` -1. The detailed descriptions for this process are illustrated as 
follows.
+
+    void HiddenLayer::ComputeFeature() {
+      for(int t = 0; t < window_size; t++){
+        if(t == 0)
+          Copy(data[t], src[t]);
+        else
+          data[t]=sigmoid(data[t-1]*W + src[t]);
+      }
     }
 
-In the backward phase, this RnnlmSigmoidLayer first updates this layer's member
-grad utilizing the information from current timestamp's next timestamp. Then
-respectively, this layer computes the gradient for its weight matrix and its
-source layer RnnlmInnerProductLayer by iterating different timestamps. The
-process can be seen below.
-
-    void RnnlmSigmoidLayer::ComputeGradient(){
-        Update grad[t];        // Update the gradient for the current layer, 
add a new term from next timestamp
-        for (int t = 0; t < windowsize_; t++) {
-                Update gweight;        // Compute the gradient for the weight 
matrix
-                Compute gsrc[t];       // Compute the gradient for src layer
+The `ComputeGradient` function computes the gradient of the loss w.r.t. W and 
the source layer.
+Particularly, for each position k, since data[k] contributes to data[k+1] and 
the feature
+at position k in its destination layer (the loss layer), grad[k] should 
contains the gradient
+from two parts. The destination layer has already computed the gradient from 
the loss layer into
+grad[k]; In the `ComputeGradient` function, we need to add the gradient from 
position k+1.
+
+    void HiddenLayer::ComputeGradient(){
+      ...
+      for (int k = window_ - 1; k >= 0; k--) {
+        if (k < window_ - 1) {
+          grad[k] += dot(grad[k + 1], weight.T()); // add gradient from 
position t+1.
         }
+        grad[k] =... // compute gL/gy[t], y[t]=data[t-1]*W+src[t]
+      }
+      gweight = dot(data.Slice(0, window_-1).T(), grad.Slice(1, window_));
+      Copy(gsrc, grad);
     }
 
+After the loop, we get the gradient of the loss w.r.t y[k], which is used to
+compute the gradient of W and the src[k].
 
+#### LossLayer
 
-### RnnlmComputationLayer
-
-This layer is a loss layer in which the performance metrics, both the
-probability of predicting the next word correctly, and perplexity (PPL in
-short) are computed. To be specific, this layer is composed of the class
-information part and the word information part. Therefore, the computation can
-be essentially divided into two parts by slicing this layer's weight matrix.
+This layer computes the cross-entropy loss and the `$log_{10}P(w_{t+1}|w_t)$` 
(which
+could be averaged over all words by users to get the PPL value).
 
-#### Configuration
+There are two configuration fields to be specified by users.
 
-In this layer, it is needed to specify whether to use a bias term during
-training.
-
-    message RnnlmComputationProto {
-        optional bool bias_term = 1 [default = true];  // use bias vector or 
not
+    message LossProto {
+      optional int32 nclass = 1;
+      optional int32 vocab_size = 2;
     }
 
+There are two weight matrices to be learned
 
-#### Functionality
-
-In the forward phase, by using the two sliced weight matrices (one is for class
-information, another is for the words in this class), this
-RnnlmComputationLayer calculates the dot product between the source layer's
-input and the sliced matrices. The results can be denoted as "y1" and "y2".
-Then after a softmax function, for each input word, the probability
-distribution of classes and the words in this classes are computed. The
-activated results can be denoted as p1 and p2. Next, using the probability
-distribution, the PPL value is computed.
-
-    void RnnlmComputationLayer::ComputeFeature() {
-        Compute y1 and y2;
-        p1 = Softmax(y1);
-        p2 = Softmax(y2);
-        Compute perplexity value PPL;
+    class LossLayer : public RNNLayer {
+      ...
+     private:
+      Param* word_weight_, *class_weight_;
     }
 
+The ComputeFeature function computes the two probabilities respectively.
 
-In the backward phase, this layer executes the following three computation
-operations. First, it computes the member gradient of the current layer by each
-timestamp. Second, this layer computes the gradient of its own weight matrix by
-aggregating calculated results from all timestamps. Third, it computes the
-gradient of its source layer, RnnlmSigmoidLayer, timestamp-wise.
+`$$P(C_{w_{t+1}}|w_t) = Softmax(w_t * class\_weight_)$$`
+`$$P(w_{t+1}|C_{w_{t+1}}) = Softmax(w_t * 
word\_weight[class\_start:class\_end])$$`
 
-    void RnnlmComputationLayer::ComputeGradient(){
-       Compute grad[t] for all timestamps;
-        Compute gweight by aggregating results computed in different 
timestamps;
-        Compute gsrc[t] for all timestamps;
-    }
+`$w_t$` is the feature from the hidden layer for the k-th word, its ground 
truth
+next word is `$w_{t+1}$`.  The first equation computes the probability 
distribution over all
+classes for the next word. The second equation computes the
+probability distribution over the words in the ground truth class for the next 
word.
 
+The ComputeGradient function computes the gradient of the source layer
+(i.e., the hidden layer) and the two weight matrices.
 
-## Updater Configuration
+### Updater Configuration
 
 We employ kFixedStep type of the learning rate change method and the
-configuration is as follows. We use different learning rate values in different
-step ranges. [Here](http://wangwei-pc.d1.comp.nus.edu.sg:4000/docs/updater/) is
-more information about choosing updaters.
+configuration is as follows. We decay the learning rate once the performance 
does
+not increase on the validation dataset.
 
     updater{
-        #weight_decay:0.0000001
-        lr_change: kFixedStep
-        type: kSGD
+      type: kSGD
+      learning_rate {
+        type: kFixedStep
         fixedstep_conf:{
           step:0
-          step:42810
-          step:49945
-          step:57080
-          step:64215
+          step:48810
+          step:56945
+          step:65080
+          step:73215
           step_lr:0.1
           step_lr:0.05
           step_lr:0.025
           step_lr:0.0125
           step_lr:0.00625
         }
+      }
     }
 
-
-## TrainOneBatch() Function
+### TrainOneBatch() Function
 
 We use BP (BackPropagation) algorithm to train the RNN model here. The
 corresponding configuration can be seen below.
 
     # In job.conf file
-    alg: kBackPropagation
-
-Refer to
-[here](http://wangwei-pc.d1.comp.nus.edu.sg:4000/docs/train-one-batch/) for
-more information on different TrainOneBatch() functions.
-
-## Cluster Configuration
-
-In this RNN language model, we configure the cluster topology as follows.
-
-    cluster {
-      nworker_groups: 1
-      nserver_groups: 1
-      nservers_per_group: 1
-      nworkers_per_group: 1
-      nservers_per_procs: 1
-      nworkers_per_procs: 1
-      workspace: "examples/rnnlm/"
+    train_one_batch {
+      alg: kBackPropagation
     }
 
-This is to train the model in one node. For other configuration choices, please
-refer to [here](http://wangwei-pc.d1.comp.nus.edu.sg:4000/docs/frameworks/).
-
-
-## Configure Job
-
-Job configuration is written in "job.conf".
-
-Note: Extended field names should be embraced with square-parenthesis [], 
e.g., [singa.rnnlmdata_conf].
-
-
-## Run Training
-
-Start training by the following commands
-
-    cd SINGA_ROOT
-    ./bin/singa-run.sh -workspace=examples/rnnlm
+### Cluster Configuration
 
+The default cluster configuration can be used, i.e., single worker and single 
server
+in a single process.

Modified: incubator/singa/site/trunk/content/markdown/docs/updater.md
URL: 
http://svn.apache.org/viewvc/incubator/singa/site/trunk/content/markdown/docs/updater.md?rev=1703880&r1=1703879&r2=1703880&view=diff
==============================================================================
--- incubator/singa/site/trunk/content/markdown/docs/updater.md (original)
+++ incubator/singa/site/trunk/content/markdown/docs/updater.md Fri Sep 18 
15:10:58 2015
@@ -1,5 +1,7 @@
 # Updater
 
+---
+
 Every server in SINGA has an [Updater](api/classsinga_1_1Updater.html)
 instance that updates parameters based on gradients.
 In this page, the *Basic user guide* describes the configuration of an updater.
@@ -33,7 +35,7 @@ Users need to configure at least the `le
       momentum: float
       weight_decay: float
       learning_rate {
-
+        ...
       }
     }
 
@@ -192,7 +194,7 @@ where step[k] is the smallest number tha
 
 ## Advanced user guide
 
-### Implementing a new Update subclass
+### Implementing a new Updater subclass
 
 The base Updater class has one virtual function,
 
@@ -279,4 +281,4 @@ The configuration is then like,
 
 Users have to register this subclass in the main function,
 
-      driver.RegisterLRGenerator<FooLRGen>("FooLR")
+      driver.RegisterLRGenerator<FooLRGen, std::string>("FooLR")

Modified: incubator/singa/site/trunk/content/markdown/index.md
URL: 
http://svn.apache.org/viewvc/incubator/singa/site/trunk/content/markdown/index.md?rev=1703880&r1=1703879&r2=1703880&view=diff
==============================================================================
--- incubator/singa/site/trunk/content/markdown/index.md (original)
+++ incubator/singa/site/trunk/content/markdown/index.md Fri Sep 18 15:10:58 
2015
@@ -1,28 +1,29 @@
 ### Getting Started
-* The [Introduction](http://singa.incubator.apache.org/docs/overview.html) 
page gives an overview of SINGA.
+* The [Introduction](docs/overview.html) page gives an overview of SINGA.
 
-* The [Installation](http://singa.incubator.apache.org/docs/installation.html)
+* The [Installation](docs/installation.html)
 guide describes details on downloading and installing SINGA.
 
-* Please follow the [Quick 
Start](http://singa.incubator.apache.org/docs/quick-start.html)
+* Please follow the [Quick Start](docs/quick-start.html)
 guide to run simple applications on SINGA.
 
 ### Documentation
-* Documentations are listed 
[here](http://singa.incubator.apache.org/docs.html).
-* Code API can be found 
[here](http://singa.incubator.apache.org/api/index.html).
-* Research publication list is available 
[here](http://singa.incubator.apache.org/research/publication).
+* Documentations are listed [here](docs.html).
+* Code API can be found [here](api/index.html).
+* Research publication list is available 
[here](http://www.comp.nus.edu.sg/~dbsystem/singa//research/publication/).
 
 ### How to contribute
 
 * Please subscribe to our development mailing list 
[email protected].
 * If you find any issues using SINGA, please report it to the
 [Issue Tracker](https://issues.apache.org/jira/browse/singa).
-* You can also contact with [SINGA 
committers](http://singa.incubator.apache.org/dev/community) directly.
+* You can also contact with [SINGA committers](dev/community) directly.
 
 More details on contributing to SINGA is described [here](dev/contribute).
 
 
 ### Recent News
+* SINGA was presented in a [workshop on deep 
learning](http://www.comp.nus.edu.sg/~dbsystem/singa/workshop) held on 16 Sep, 
2015
 * SINGA will be presented at [BOSS](http://boss.dima.tu-berlin.de/) of
 [VLDB 2015](http://www.vldb.org/2015/) at Hawai'i, 4 Sep, 2015.
 (slides: [overview](files/singa-vldb-boss.pptx),
@@ -51,12 +52,11 @@ Please cite the following two papers if
 * B. C. Ooi, K.-L. Tan, S. Wang, W. Wang, Q. Cai, G. Chen, J. Gao, Z. Luo,
 A. K. H. Tung, Y. Wang, Z. Xie, M. Zhang, and K. Zheng. [SINGA: A distributed
 deep learning platform](http://www.comp.nus.edu.sg/~ooibc/singaopen-mm15.pdf). 
ACM Multimedia
- (Open Source Software Competition) 2015 
([BibTex](http://singa.incubator.apache.org/assets/file/bib-oss.txt)).
+ (Open Source Software Competition) 2015 
([BibTex](http://www.comp.nus.edu.sg/~dbsystem/singa//assets/file/bib-oss.txt)).
 
 * W. Wang, G. Chen, T. T. A. Dinh, B. C. Ooi, K.-L.Tan, J. Gao, and S. Wang.
 [SINGA:putting deep learning in the hands of multimedia 
users](http://www.comp.nus.edu.sg/~ooibc/singa-mm15.pdf).
-ACM Multimedia 2015 
([BibTex](http://singa.incubator.apache.org/assets/file/bib-singa.txt)).
+ACM Multimedia 2015 
([BibTex](http://www.comp.nus.edu.sg/~dbsystem/singa//assets/file/bib-singa.txt)).
 
 ### License
 SINGA is released under [Apache License Version 
2.0](http://www.apache.org/licenses/LICENSE-2.0).
-

Modified: incubator/singa/site/trunk/content/site.xml
URL: 
http://svn.apache.org/viewvc/incubator/singa/site/trunk/content/site.xml?rev=1703880&r1=1703879&r2=1703880&view=diff
==============================================================================
--- incubator/singa/site/trunk/content/site.xml (original)
+++ incubator/singa/site/trunk/content/site.xml Fri Sep 18 15:10:58 2015
@@ -51,6 +51,15 @@
       <item name="Welcome" href="index.html"/>
     </menu>
 
+    <head>
+      <script type="text/javascript"
+        
src="https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML";>
+      </script>
+      <script type="text/x-mathjax-config">
+        MathJax.Hub.Config({tex2jax: {inlineMath: [['$','$'], 
['\\(','\\)']]}});
+      </script>
+    </head>
+
     <menu name="Documentaion">
       <item name="Introduction" href="docs/overview.html"/>
       <item name="Installation" href="docs/installation.html"/>



Reply via email to