Modified: websites/staging/singa/trunk/content/docs/rnn.html ============================================================================== --- websites/staging/singa/trunk/content/docs/rnn.html (original) +++ websites/staging/singa/trunk/content/docs/rnn.html Wed Sep 2 10:31:57 2015 @@ -1,15 +1,15 @@ <!DOCTYPE html> <!-- - | Generated by Apache Maven Doxia at 2015-08-17 + | Generated by Apache Maven Doxia at 2015-09-02 | Rendered using Apache Maven Fluido Skin 1.4 --> <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"> <head> <meta charset="UTF-8" /> <meta name="viewport" content="width=device-width, initial-scale=1.0" /> - <meta name="Date-Revision-yyyymmdd" content="20150817" /> + <meta name="Date-Revision-yyyymmdd" content="20150902" /> <meta http-equiv="Content-Language" content="en" /> - <title>Apache SINGA – Recurrent neural networks (RNN)</title> + <title>Apache SINGA – RNN Example</title> <link rel="stylesheet" href="../css/apache-maven-fluido-1.4.min.css" /> <link rel="stylesheet" href="../css/site.css" /> <link rel="stylesheet" href="../css/print.css" media="print" /> @@ -189,7 +189,7 @@ Apache SINGA</a> <span class="divider">/</span> </li> - <li class="active ">Recurrent neural networks (RNN)</li> + <li class="active ">RNN Example</li> @@ -425,21 +425,50 @@ <div id="bodyColumn" class="span10" > - <div class="section"> -<h2><a name="Recurrent_neural_networks_RNN"></a>Recurrent neural networks (RNN)</h2> -<p>Example files for RNN can be found in “SINGA_ROOT/examples/rnnlm”, which we assume to be WORKSPACE.</p> -<div class="section"> -<h3><a name="Create_DataShard"></a>Create DataShard</h3> -<p>(a) Define your own record. Please refer to <a class="externalLink" href="http://singa.incubator.apache.org/docs/data.html">Data Preparation</a> for details.</p> -<p>Records for RNN example are defined in “user.proto” as an extension.</p> + <h1>RNN Example</h1> +<p>Recurrent Neural Networks (RNN) are widely used for modeling sequential data, such as music, videos and sentences. In this example, we use SINGA to train a <a class="externalLink" href="http://www.fit.vutbr.cz/research/groups/speech/publi/2010/mikolov_interspeech2010_IS100722.pdf">RNN model</a> proposed by Tomas Mikolov for <a class="externalLink" href="https://en.wikipedia.org/wiki/Language_model">language modeling</a>. The training objective (loss) is minimize the <a class="externalLink" href="https://en.wikipedia.org/wiki/Perplexity">perplexity per word</a>, which is equivalent to maximize the probability of predicting the next word given the current word in a sentence.</p> +<p>Different to the <a class="externalLink" href="http://singa.incubator.apache.org/docs/cnn">CNN</a>, <a class="externalLink" href="http://singa.incubator.apache.org/docs/mlp">MLP</a> and <a class="externalLink" href="http://singa.incubator.apache.org/docs/rbm">RBM</a> examples which use built-in <a class="externalLink" href="http://singa.incubator.apache.org/docs/layer">Layer</a>s and <a class="externalLink" href="http://singa.incubator.apache.org/docs/data">Record</a>s, none of the layers in this model is built-in. Hence users can get examples of implementing their own Layers and data Records in this page.</p> +<div class="section"> +<h2><a name="Running_instructions"></a>Running instructions</h2> +<p>In <i>SINGA_ROOT/examples/rnn/</i>, scripts are provided to run the training job. First, the data is prepared by</p> <div class="source"> -<div class="source"><pre class="prettyprint">package singa; +<div class="source"><pre class="prettyprint">$ cp Makefile.example Makefile +$ make download +$ make create +</pre></div></div> +<p>Second, the training is started by passing the job configuration as,</p> -import "common.proto"; // Record message for SINGA is defined -import "job.proto"; // Layer message for SINGA is defined +<div class="source"> +<div class="source"><pre class="prettyprint"># in SINGA_ROOT +$ ./bin/singa-run.sh -conf SINGA_ROOT/examples/rnn/job.conf +</pre></div></div></div> +<div class="section"> +<h2><a name="Implementations"></a>Implementations</h2> +<p><img src="http://singa.incubator.apache.org/assets/image/rnn-refine.png" align="center" width="300px" alt="" /> <span><b>Figure 1 - Net structure of the RNN model.</b></span></p> +<p>The neural net structure is shown Figure 1. Word records are loaded by <tt>RnnlmDataLayer</tt> from <tt>WordShard</tt>. <tt>RnnlmWordparserLayer</tt> parses word records to get word indexes (in the vocabulary). For every iteration, <tt>window_size</tt> words are processed. <tt>RnnlmWordinputLayer</tt> looks up a word embedding matrix to extract feature vectors for words in the window. These features are transformed by <tt>RnnlmInnerproductLayer</tt> layer and <tt>RnnlmSigmoidLayer</tt>. <tt>RnnlmSigmoidLayer</tt> is a recurrent layer that forwards features from previous words to next words. Finally, <tt>RnnlmComputationLayer</tt> computes the perplexity loss with word class information from <tt>RnnlmClassparserLayer</tt>. The word class is a cluster ID. Words are clustered based on their frequency in the dataset, e.g., frequent words are clustered together and less frequent words are clustered together. Clustering is to improve the efficiency of the final prediction process.</p> +<div class="section"> +<h3><a name="Data_preparation"></a>Data preparation</h3> +<p>We use a small dataset in this example. In this dataset, [dataset description, e.g., format]. The subsequent steps follow the instructions in <a class="externalLink" href="http://singa.incubator.apache.org/docs/data">Data Preparation</a> to convert the raw data into <tt>Record</tt>s and insert them into <tt>DataShard</tt>s.</p> +<div class="section"> +<h4><a name="Download_source_data"></a>Download source data</h4> + +<div class="source"> +<div class="source"><pre class="prettyprint"># in SINGA_ROOT/examples/rnn/ +wget http://www.fit.vutbr.cz/~imikolov/rnnlm/simple-examples.tgz +xxx +</pre></div></div></div> +<div class="section"> +<h4><a name="Define_your_own_record."></a>Define your own record.</h4> +<p>Since this dataset has different format as the built-in <tt>SingleLabelImageRecord</tt>, we need to extend the base <tt>Record</tt> to add new fields,</p> -extend Record { +<div class="source"> +<div class="source"><pre class="prettyprint"># in SINGA_ROOT/examples/rnn/user.proto +package singa; + +import "common.proto"; // import SINGA Record + +extend Record { // extend base Record to include users' records optional WordClassRecord wordclass = 101; optional SingleWordRecord singleword = 102; } @@ -455,23 +484,69 @@ message SingleWordRecord { optional int32 word_index = 2; // the index of this word in the vocabulary optional int32 class_index = 3; // the index of the class corresponding to this word } +</pre></div></div></div> +<div class="section"> +<h4><a name="Create_data_shard_for_training_and_testing"></a>Create data shard for training and testing</h4> +<p>{% comment %} As the vocabulary size is very large, the original perplexity calculation method is time consuming. Because it has to calculate the probabilities of all possible words for</p> + +<div class="source"> +<div class="source"><pre class="prettyprint">p(wt|w0, w1, ... wt-1). </pre></div></div> -<p>(b) Download raw data</p> -<p>This example downloads rnnlm-0.4b from <a href="www.rnnlm.org">www.rnnlm.org</a> by a command </p> +<p>Tomas proposed to divide all words into different classes according to the word frequency, and compute the perplexity according to</p> <div class="source"> -<div class="source"><pre class="prettyprint">make download +<div class="source"><pre class="prettyprint">p(wt|w0, w1, ... wt-1) = p(c|w0,w1,..wt-1) p(w|c) </pre></div></div> -<p>The raw data is stored in a folder “rnnlm-0.4b/train” and “rnnlm-0.4b/test”.</p> -<p>(c) Create data shard for training and testing</p> -<p>Data shards (e.g., “shard.dat”) will be created in “rnnlm_class_shard”, “rnnlm_vocab_shard”, “rnnlm_word_shard_train” and “rnnlm_word_shard_test” by a command</p> +<p>where <tt>c</tt> is the word class, <tt>w0, w1...wt-1</tt> are the previous words before <tt>wt</tt>. The probabilities on the right side can be computed faster than</p> +<p><a class="externalLink" href="https://github.com/kaiping/incubator-singa/blob/rnnlm/examples/rnnlm/Makefile">Makefile</a> for creating the shards (see in <a class="externalLink" href="https://github.com/kaiping/incubator-singa/blob/rnnlm/examples/rnnlm/create_shard.cc">create_shard.cc</a>), we need to specify where to download the source data, number of classes we want to divide all occurring words into, and all the shards together with their names, directories we want to create. {% endcomment %}</p> +<p><i>SINGA_ROOT/examples/rnn/create_shard.cc</i> defines the following function for creating data shards,</p> + +<div class="source"> +<div class="source"><pre class="prettyprint">void create_shard(const char *input, int nclass) { +</pre></div></div> +<p><tt>input</tt> is the path to [the text file], <tt>nclass</tt> is user specified cluster size. This function starts with</p> + +<div class="source"> +<div class="source"><pre class="prettyprint"> using StrIntMap = std::map<std::string, int>; + StrIntMap *wordIdxMapPtr; // Mapping word string to a word index + StrIntMap *wordClassIdxMapPtr; // Mapping word string to a word class index + if (-1 == nclass) { + loadClusterForNonTrainMode(input, nclass, &wordIdxMap, &wordClassIdxMap); // non-training phase + } else { + doClusterForTrainMode(input, nclass, &wordIdxMap, &wordClassIdxMap); // training phase + } +</pre></div></div> + +<ul> + +<li>If <tt>-1 == nclass</tt>, <tt>path</tt> points to the training data file. <tt>doClusterForTrainMode</tt> reads all the words in the file to create the two maps. [The two maps are stored in xxx]</li> + +<li>otherwise, <tt>path</tt> points to either test or validation data file. <tt>loadClusterForNonTrainMode</tt> loads the two maps from [xxx].</li> +</ul> +<p>Words from training/text/validation files are converted into <tt>Record</tt>s by</p> + +<div class="source"> +<div class="source"><pre class="prettyprint"> singa::SingleWordRecord *wordRecord = record.MutableExtension(singa::singleword); + while (in >> word) { + wordRecord->set_word(word); + wordRecord->set_word_index(wordIdxMap[word]); + wordRecord->set_class_index(wordClassIdxMap[word]); + snprintf(key, kMaxKeyLength, "%08d", wordIdxMap[word]); + wordShard.Insert(std::string(key), record); + } +} +</pre></div></div> +<p>Compilation and running commands are provided in the <i>Makefile.example</i>. After executing</p> <div class="source"> <div class="source"><pre class="prettyprint">make create -</pre></div></div></div> +</pre></div></div> +<p>, three data shards will created using the <tt>create_shard.cc</tt>, namely, <i>rnnlm_word_shard_train</i>, <i>rnnlm_word_shard_test</i> and <i>rnnlm_word_shard_valid</i>.</p></div></div> <div class="section"> -<h3><a name="Define_Layers"></a>Define Layers</h3> -<p>Similar to records, layers are also defined in “user.proto” as an extension.</p> +<h3><a name="Layer_implementation"></a>Layer implementation</h3> +<p>7 layers (i.e., Layer subclasses) are implemented for this application, including 1 <a class="externalLink" href="http://singa.incubator.apache.org/docs/layer#data-layers">data layer</a> which fetches data records from data shards, 2 <a class="externalLink" href="http://singa.incubator.apache.org/docs/layer#parser-layers">parser layers</a> which parses the input records, 3 neuron layers which transforms the word features and 1 loss layer which computes the objective loss.</p> +<p>First, we illustrate the data shard and how to create it for this application. Then, we discuss the configuration and functionality of layers. Finally, we introduce how to configure a job and then run the training for your own model.</p> +<p>Following the guide for implementing <a class="externalLink" href="http://singa.incubator.apache.org/docs/layer#implementing-a-new-layer-subclass">new Layer subclasses</a>, we extend the <a class="externalLink" href="http://singa.incubator.apache.org/api/classsinga_1_1LayerProto.html">LayerProto</a> to include the configuration message of each user-defined layer as shown below (5 out of the 7 layers have specific configurations),</p> <div class="source"> <div class="source"><pre class="prettyprint">package singa; @@ -487,56 +562,285 @@ extend LayerProto { optional RnnlmWordinputProto rnnlmwordinput_conf = 204; optional RnnlmDataProto rnnlmdata_conf = 207; } +</pre></div></div> +<p>In the subsequent sections, we describe the implementation of each layer, including it configuration message.</p></div> +<div class="section"> +<h3><a name="RnnlmDataLayer"></a>RnnlmDataLayer</h3> +<p>It inherits <a href="/api/classsinga_1_1DataLayer.html">DataLayer</a> for loading word and class <tt>Record</tt>s from <tt>DataShard</tt>s into memory.</p> +<div class="section"> +<h4><a name="Functionality"></a>Functionality</h4> -// 1-Message that stores parameters used by RnnlmComputationLayer -message RnnlmComputationProto { - optional bool bias_term = 1 [default = true]; // use bias vector or not +<div class="source"> +<div class="source"><pre class="prettyprint">void RnnlmDataLayer::Setup() { + read records from ClassShard to construct mapping from word string to class index + Resize length of records_ as window_size + 1 + Read 1st word record to the last position } -// 2-Message that stores parameters used by RnnlmSigmoidLayer -message RnnlmSigmoidProto { - optional bool bias_term = 1 [default = true]; // use bias vector or not + +void RnnlmDataLayer::ComputeFeature() { + records_[0] = records_[windowsize_]; //Copy the last record to 1st position in the record vector + Assign values to records_; //Read window_size new word records from WordShard } +</pre></div></div> +<p>The <tt>Steup</tt> function load the mapping (from word string to class index) from <i>ClassShard</i>.</p> +<p>Every time the <tt>ComputeFeature</tt> function is called, it loads <tt>windowsize_</tt> records from <tt>WordShard</tt>.</p> +<p>[For the consistency of operations at each training iteration, it maintains a record vector (length of window_size + 1). It reads the 1st record from the WordShard and puts it in the last position of record vector].</p></div> +<div class="section"> +<h4><a name="Configuration"></a>Configuration</h4> -// 3-Message that stores parameters used by RnnlmInnerproductLayer -message RnnlmInnerproductProto { - required int32 num_output = 1; // number of outputs for the layer - optional bool bias_term = 30 [default = true]; // use bias vector or not +<div class="source"> +<div class="source"><pre class="prettyprint">message RnnlmDataProto { + required string class_path = 1; // path to the class data file/folder, absolute or relative to the workspace + required string word_path = 2; // path to the word data file/folder, absolute or relative to the workspace + required int32 window_size = 3; // window size. } +</pre></div></div> +<p>[class_path to file or folder?]</p> +<p>[There two paths, <tt>class_path</tt> for …; <tt>word_path</tt> for.. The <tt>window_size</tt> is set to …]</p></div></div> +<div class="section"> +<h3><a name="RnnlmWordParserLayer"></a>RnnlmWordParserLayer</h3> +<p>This layer gets <tt>window_size</tt> word strings from the <tt>RnnlmDataLayer</tt> and looks up the word string to word index map to get word indexes.</p> +<div class="section"> +<h4><a name="Functionality"></a>Functionality</h4> -// 4-Message that stores parameters used by RnnlmWordinputLayer -message RnnlmWordinputProto { +<div class="source"> +<div class="source"><pre class="prettyprint">void RnnlmWordparserLayer::Setup(){ + Obtain window size from src layer; + Obtain vocabulary size from src layer; + Reshape data_ as {window_size}; +} + +void RnnlmWordparserLayer::ParseRecords(Blob* blob){ + for each word record in the window, get its word index and insert the index into blob +} +</pre></div></div></div> +<div class="section"> +<h4><a name="Configuration"></a>Configuration</h4> +<p>This layer does not have specific configuration fields.</p></div></div> +<div class="section"> +<h3><a name="RnnlmClassParserLayer"></a>RnnlmClassParserLayer</h3> +<p>It maps each word in the processing window into a class index.</p> +<div class="section"> +<h4><a name="Functionality"></a>Functionality</h4> + +<div class="source"> +<div class="source"><pre class="prettyprint">void RnnlmClassparserLayer::Setup(){ + Obtain window size from src layer; + Obtain vocaubulary size from src layer; + Obtain class size from src layer; + Reshape data_ as {windowsize_, 4}; +} + +void RnnlmClassparserLayer::ParseRecords(){ + for(int i = 1; i < records.size(); i++){ + Copy starting word index in this class to data[i]'s 1st position; + Copy ending word index in this class to data[i]'s 2nd position; + Copy index of input word to data[i]'s 3rd position; + Copy class index of input word to data[i]'s 4th position; + } +} +</pre></div></div> +<p>The setup function read</p></div> +<div class="section"> +<h4><a name="Configuration"></a>Configuration</h4> +<p>This layer fetches the class information (the mapping information between classes and words) from RnnlmDataLayer and maintains this information as data in this layer.</p> +<p>Next, this layer parses the last “window_size” number of word records from RnnlmDataLayer and stores them as data. Then, it retrieves the corresponding class for each input word. It stores the starting word index of this class, ending word index of this class, word index and class index respectively.</p></div></div> +<div class="section"> +<h3><a name="RnnlmWordInputLayer"></a>RnnlmWordInputLayer</h3> +<p>Using the input word records, this layer obtains corresponding word vectors as its data. Then, it passes the data to RnnlmInnerProductLayer above for further processing.</p> +<div class="section"> +<h4><a name="Configuration"></a>Configuration</h4> +<p>In this layer, the length of each word vector needs to be configured. Besides, whether to use bias term during the training process should also be configured (See more in <a class="externalLink" href="https://github.com/kaiping/incubator-singa/blob/rnnlm/src/proto/job.proto">job.proto</a>).</p> + +<div class="source"> +<div class="source"><pre class="prettyprint">message RnnlmWordinputProto { required int32 word_length = 1; // vector length for each input word optional bool bias_term = 30 [default = true]; // use bias vector or not } +</pre></div></div></div> +<div class="section"> +<h4><a name="Functionality"></a>Functionality</h4> +<p>In setup phase, this layer first reshapes its members such as “data”, “grad”, and “weight” matrix. Then, it obtains the vocabulary size from its source layer (i.e., RnnlmWordParserLayer).</p> +<p>In the forward phase, using the “window_size” number of input word indices, the “window_size” number of word vectors are selected from this layer’s weight matrix, each word index corresponding to one row.</p> -// 5-Message that stores parameters used by RnnlmWordparserLayer - nothing needs to be configured -//message RnnlmWordparserProto { -//} - -// 6-Message that stores parameters used by RnnlmClassparserLayer - nothing needs to be configured -//message RnnlmClassparserProto { -//} - -// 7-Message that stores parameters used by RnnlmDataLayer -message RnnlmDataProto { - required string class_path = 1; // path to the data file/folder, absolute or relative to the workspace - required string word_path = 2; - required int32 window_size = 3; // window size. +<div class="source"> +<div class="source"><pre class="prettyprint">void RnnlmWordinputLayer::ComputeFeature() { + for(int t = 0; t < windowsize_; t++){ + data[t] = weight[src[t]]; + } +} +</pre></div></div> +<p>In the backward phase, after computing this layer’s gradient in its destination layer (i.e., RnnlmInnerProductLayer), here the gradient of the weight matrix in this layer is copied (by row corresponding to word indices) from this layer’s gradient.</p> + +<div class="source"> +<div class="source"><pre class="prettyprint">void RnnlmWordinputLayer::ComputeGradient() { + for(int t = 0; t < windowsize_; t++){ + gweight[src[t]] = grad[t]; + } +} +</pre></div></div></div></div> +<div class="section"> +<h3><a name="RnnlmInnerProductLayer"></a>RnnlmInnerProductLayer</h3> +<p>This is a neuron layer which receives the data from RnnlmWordInputLayer and sends the computation results to RnnlmSigmoidLayer.</p> +<div class="section"> +<h4><a name="Configuration"></a>Configuration</h4> +<p>In this layer, the number of neurons needs to be specified. Besides, whether to use a bias term should also be configured.</p> + +<div class="source"> +<div class="source"><pre class="prettyprint">message RnnlmInnerproductProto { + required int32 num_output = 1; //Number of outputs for the layer + optional bool bias_term = 30 [default = true]; //Use bias vector or not +} +</pre></div></div></div> +<div class="section"> +<h4><a name="Functionality"></a>Functionality</h4> +<p>In the forward phase, this layer is in charge of executing the dot multiplication between its weight matrix and the data in its source layer (i.e., RnnlmWordInputLayer).</p> + +<div class="source"> +<div class="source"><pre class="prettyprint">void RnnlmInnerproductLayer::ComputeFeature() { + data = dot(src, weight); //Dot multiplication operation +} +</pre></div></div> +<p>In the backward phase, this layer needs to first compute the gradient of its source layer (i.e., RnnlmWordInputLayer). Then, it needs to compute the gradient of its weight matrix by aggregating computation results for each timestamp. The details can be seen as follows.</p> + +<div class="source"> +<div class="source"><pre class="prettyprint">void RnnlmInnerproductLayer::ComputeGradient() { + for (int t = 0; t < windowsize_; t++) { + Add the dot product of src[t] and grad[t] to gweight; + } + Copy the dot product of grad and weight to gsrc; +} +</pre></div></div></div></div> +<div class="section"> +<h3><a name="RnnlmSigmoidLayer"></a>RnnlmSigmoidLayer</h3> +<p>This is a neuron layer for computation. During the computation in this layer, each component of the member data specific to one timestamp uses its previous timestamp’s data component as part of the input. This is how the time-order information is utilized in this language model application.</p> +<p>Besides, if you want to implement a recurrent neural network following our design, this layer is of vital importance for you to refer to. Also, you can always think of other design methods to make use of information from past timestamps.</p> +<div class="section"> +<h4><a name="Configuration"></a>Configuration</h4> +<p>In this layer, whether to use a bias term needs to be specified.</p> + +<div class="source"> +<div class="source"><pre class="prettyprint">message RnnlmSigmoidProto { + optional bool bias_term = 1 [default = true]; // use bias vector or not +} +</pre></div></div></div> +<div class="section"> +<h4><a name="Functionality"></a>Functionality</h4> +<p>In the forward phase, this layer first receives data from its source layer (i.e., RnnlmInnerProductLayer) which is used as one part input for computation. Then, for each timestampe this layer executes a dot multiplication between its previous timestamp information and its own weight matrix. The results are the other part for computation. This layer sums these two parts together and executes an activation operation. The detailed descriptions for this process are illustrated as follows.</p> + +<div class="source"> +<div class="source"><pre class="prettyprint">void RnnlmSigmoidLayer::ComputeFeature() { + for(int t = 0; t < window_size; t++){ + if(t == 0) Copy the sigmoid results of src[t] to data[t]; + else Compute the dot product of data[t - 1] and weight, and add sigmoid results of src[t] to be data[t]; + } +} +</pre></div></div> +<p>In the backward phase, this RnnlmSigmoidLayer first updates this layer’s member grad utilizing the information from current timestamp’s next timestamp. Then respectively, this layer computes the gradient for its weight matrix and its source layer RnnlmInnerProductLayer by iterating different timestamps. The process can be seen below.</p> + +<div class="source"> +<div class="source"><pre class="prettyprint">void RnnlmSigmoidLayer::ComputeGradient(){ + Update grad[t]; // Update the gradient for the current layer, add a new term from next timestamp + for (int t = 0; t < windowsize_; t++) { + Update gweight; // Compute the gradient for the weight matrix + Compute gsrc[t]; // Compute the gradient for src layer + } +} +</pre></div></div></div></div> +<div class="section"> +<h3><a name="RnnlmComputationLayer"></a>RnnlmComputationLayer</h3> +<p>This layer is a loss layer in which the performance metrics, both the probability of predicting the next word correctly, and perplexity (PPL in short) are computed. To be specific, this layer is composed of the class information part and the word information part. Therefore, the computation can be essentially divided into two parts by slicing this layer’s weight matrix.</p> +<div class="section"> +<h4><a name="Configuration"></a>Configuration</h4> +<p>In this layer, it is needed to specify whether to use a bias term during training.</p> + +<div class="source"> +<div class="source"><pre class="prettyprint">message RnnlmComputationProto { + optional bool bias_term = 1 [default = true]; // use bias vector or not } </pre></div></div></div> <div class="section"> -<h3><a name="Configure_Job"></a>Configure Job</h3> +<h4><a name="Functionality"></a>Functionality</h4> +<p>In the forward phase, by using the two sliced weight matrices (one is for class information, another is for the words in this class), this RnnlmComputationLayer calculates the dot product between the source layer’s input and the sliced matrices. The results can be denoted as “y1” and “y2”. Then after a softmax function, for each input word, the probability distribution of classes and the words in this classes are computed. The activated results can be denoted as p1 and p2. Next, using the probability distribution, the PPL value is computed.</p> + +<div class="source"> +<div class="source"><pre class="prettyprint">void RnnlmComputationLayer::ComputeFeature() { + Compute y1 and y2; + p1 = Softmax(y1); + p2 = Softmax(y2); + Compute perplexity value PPL; +} +</pre></div></div> +<p>In the backward phase, this layer executes the following three computation operations. First, it computes the member gradient of the current layer by each timestamp. Second, this layer computes the gradient of its own weight matrix by aggregating calculated results from all timestamps. Third, it computes the gradient of its source layer, RnnlmSigmoidLayer, timestamp-wise.</p> + +<div class="source"> +<div class="source"><pre class="prettyprint">void RnnlmComputationLayer::ComputeGradient(){ + Compute grad[t] for all timestamps; + Compute gweight by aggregating results computed in different timestamps; + Compute gsrc[t] for all timestamps; +} +</pre></div></div></div></div></div> +<div class="section"> +<h2><a name="Updater_Configuration"></a>Updater Configuration</h2> +<p>We employ kFixedStep type of the learning rate change method and the configuration is as follows. We use different learning rate values in different step ranges. <a class="externalLink" href="http://wangwei-pc.d1.comp.nus.edu.sg:4000/docs/updater/">Here</a> is more information about choosing updaters.</p> + +<div class="source"> +<div class="source"><pre class="prettyprint">updater{ + #weight_decay:0.0000001 + lr_change: kFixedStep + type: kSGD + fixedstep_conf:{ + step:0 + step:42810 + step:49945 + step:57080 + step:64215 + step_lr:0.1 + step_lr:0.05 + step_lr:0.025 + step_lr:0.0125 + step_lr:0.00625 + } +} +</pre></div></div></div> +<div class="section"> +<h2><a name="TrainOneBatch_Function"></a>TrainOneBatch() Function</h2> +<p>We use BP (BackPropagation) algorithm to train the RNN model here. The corresponding configuration can be seen below.</p> + +<div class="source"> +<div class="source"><pre class="prettyprint"># In job.conf file +alg: kBackPropagation +</pre></div></div> +<p>Refer to <a class="externalLink" href="http://wangwei-pc.d1.comp.nus.edu.sg:4000/docs/train-one-batch/">here</a> for more information on different TrainOneBatch() functions.</p></div> +<div class="section"> +<h2><a name="Cluster_Configuration"></a>Cluster Configuration</h2> +<p>In this RNN language model, we configure the cluster topology as follows.</p> + +<div class="source"> +<div class="source"><pre class="prettyprint">cluster { + nworker_groups: 1 + nserver_groups: 1 + nservers_per_group: 1 + nworkers_per_group: 1 + nservers_per_procs: 1 + nworkers_per_procs: 1 + workspace: "examples/rnnlm/" +} +</pre></div></div> +<p>This is to train the model in one node. For other configuration choices, please refer to <a class="externalLink" href="http://wangwei-pc.d1.comp.nus.edu.sg:4000/docs/frameworks/">here</a>.</p></div> +<div class="section"> +<h2><a name="Configure_Job"></a>Configure Job</h2> <p>Job configuration is written in “job.conf”.</p> <p>Note: Extended field names should be embraced with square-parenthesis [], e.g., [singa.rnnlmdata_conf].</p></div> <div class="section"> -<h3><a name="Run_Training"></a>Run Training</h3> +<h2><a name="Run_Training"></a>Run Training</h2> <p>Start training by the following commands</p> <div class="source"> <div class="source"><pre class="prettyprint">cd SINGA_ROOT ./bin/singa-run.sh -workspace=examples/rnnlm -</pre></div></div></div></div> +</pre></div></div></div> </div> </div> </div>
Added: websites/staging/singa/trunk/content/docs/train-one-batch.html ============================================================================== --- websites/staging/singa/trunk/content/docs/train-one-batch.html (added) +++ websites/staging/singa/trunk/content/docs/train-one-batch.html Wed Sep 2 10:31:57 2015 @@ -0,0 +1,583 @@ +<!DOCTYPE html> +<!-- + | Generated by Apache Maven Doxia at 2015-09-02 + | Rendered using Apache Maven Fluido Skin 1.4 +--> +<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"> + <head> + <meta charset="UTF-8" /> + <meta name="viewport" content="width=device-width, initial-scale=1.0" /> + <meta name="Date-Revision-yyyymmdd" content="20150902" /> + <meta http-equiv="Content-Language" content="en" /> + <title>Apache SINGA – Train-One-Batch</title> + <link rel="stylesheet" href="../css/apache-maven-fluido-1.4.min.css" /> + <link rel="stylesheet" href="../css/site.css" /> + <link rel="stylesheet" href="../css/print.css" media="print" /> + + + + + + <script type="text/javascript" src="../js/apache-maven-fluido-1.4.min.js"></script> + + + </head> + <body class="topBarEnabled"> + + + + + + + <a href="https://github.com/apache/incubator-singa"> + <img style="position: absolute; top: 0; right: 0; border: 0; z-index: 10000;" + src="https://s3.amazonaws.com/github/ribbons/forkme_right_orange_ff7600.png" + alt="Fork me on GitHub"> + </a> + + + + + + <div id="topbar" class="navbar navbar-fixed-top navbar-inverse"> + <div class="navbar-inner"> + <div class="container-fluid"> + <a data-target=".nav-collapse" data-toggle="collapse" class="btn btn-navbar"> + <span class="icon-bar"></span> + <span class="icon-bar"></span> + <span class="icon-bar"></span> + </a> + + <ul class="nav"> + <li class="dropdown"> + <a href="#" class="dropdown-toggle" data-toggle="dropdown">Apache SINGA <b class="caret"></b></a> + <ul class="dropdown-menu"> + + <li> <a href="../index.html" title="Welcome">Welcome</a> +</li> + + <li> <a href="../introduction.html" title="Introduction">Introduction</a> +</li> + + <li> <a href="../quick-start.html" title="Quick Start">Quick Start</a> +</li> + </ul> + </li> + <li class="dropdown"> + <a href="#" class="dropdown-toggle" data-toggle="dropdown">Documentaion <b class="caret"></b></a> + <ul class="dropdown-menu"> + + <li> <a href="../docs/installation.html" title="Installation">Installation</a> +</li> + + <li class="dropdown-submenu"> + <a href="../docs/programmer-guide.html" title="Programmer Guide">Programmer Guide</a> + <ul class="dropdown-menu"> + <li> <a href="../docs/model-config.html" title="Model Configuration">Model Configuration</a> +</li> + <li> <a href="../docs/neuralnet.html" title="Neural Network">Neural Network</a> +</li> + <li> <a href="../docs/layer.html" title="Layer">Layer</a> +</li> + <li> <a href="../docs/param.html" title="Param">Param</a> +</li> + </ul> + </li> + + <li class="dropdown-submenu"> + <a href="../docs/distributed-training.html" title="Distributed Training">Distributed Training</a> + <ul class="dropdown-menu"> + <li> <a href="../docs/architecture.html" title="System Architecture">System Architecture</a> +</li> + <li> <a href="../docs/frameworks.html" title="Frameworks">Frameworks</a> +</li> + <li> <a href="../docs/communication.html" title="Communication">Communication</a> +</li> + </ul> + </li> + + <li> <a href="../docs/data.html" title="Data Preparation">Data Preparation</a> +</li> + + <li> <a href="../docs/checkpoint.html" title="Checkpoint">Checkpoint</a> +</li> + + <li> <a href="../docs/examples.html" title="Examples">Examples</a> +</li> + + <li> <a href="../docs/debug.html" title="Debug">Debug</a> +</li> + </ul> + </li> + <li class="dropdown"> + <a href="#" class="dropdown-toggle" data-toggle="dropdown">Development <b class="caret"></b></a> + <ul class="dropdown-menu"> + + <li> <a href="../develop/schedule.html" title="Schedule">Schedule</a> +</li> + + <li class="dropdown-submenu"> + <a href="../develop/how-contribute.html" title="How to Contribute">How to Contribute</a> + <ul class="dropdown-menu"> + <li> <a href="../develop/contribute-code.html" title="Code">Code</a> +</li> + <li> <a href="../develop/contribute-docs.html" title="Documentation">Documentation</a> +</li> + </ul> + </li> + </ul> + </li> + <li class="dropdown"> + <a href="#" class="dropdown-toggle" data-toggle="dropdown">Community <b class="caret"></b></a> + <ul class="dropdown-menu"> + + <li> <a href="../community/source-repository.html" title="Source Repository">Source Repository</a> +</li> + + <li> <a href="../community/mail-lists.html" title="Mailing Lists">Mailing Lists</a> +</li> + + <li> <a href="../community/issue-tracking.html" title="Issue Tracking">Issue Tracking</a> +</li> + + <li> <a href="../community/team-list.html" title="SINGA Team">SINGA Team</a> +</li> + </ul> + </li> + <li class="dropdown"> + <a href="#" class="dropdown-toggle" data-toggle="dropdown">External Links <b class="caret"></b></a> + <ul class="dropdown-menu"> + + <li> <a href="http://www.apache.org/" title="Apache Software Foundation">Apache Software Foundation</a> +</li> + + <li> <a href="http://www.comp.nus.edu.sg/~dbsystem/singa/" title="NUS School of Computing">NUS School of Computing</a> +</li> + </ul> + </li> + </ul> + + + + + </div> + + </div> + </div> + </div> + + <div class="container-fluid"> + <div id="banner"> + <div class="pull-left"> + <a href="../index.html" id="bannerLeft" title="Apache SINGA"> + <img src="../images/singa-logo.png" alt="Apache SINGA"/> + </a> + </div> + <div class="pull-right"> <div id="bannerRight"> + <img src="../images/singa-title.png" alt="Apache SINGA"/> + </div> + </div> + <div class="clear"><hr/></div> + </div> + + <div id="breadcrumbs"> + <ul class="breadcrumb"> + + + <li class=""> + <a href="../index.html" title="Apache SINGA"> + Apache SINGA</a> + <span class="divider">/</span> + </li> + <li class="active ">Train-One-Batch</li> + + + + + </ul> + </div> + + + <div class="row-fluid"> + <div id="leftColumn" class="span2"> + <div class="well sidebar-nav"> + + + <ul class="nav nav-list"> + <li class="nav-header">Apache SINGA</li> + + <li> + + <a href="../index.html" title="Welcome"> + <span class="none"></span> + Welcome</a> + </li> + + <li> + + <a href="../introduction.html" title="Introduction"> + <span class="none"></span> + Introduction</a> + </li> + + <li> + + <a href="../quick-start.html" title="Quick Start"> + <span class="none"></span> + Quick Start</a> + </li> + <li class="nav-header">Documentaion</li> + + <li> + + <a href="../docs/installation.html" title="Installation"> + <span class="none"></span> + Installation</a> + </li> + + <li> + + <a href="../docs/programmer-guide.html" title="Programmer Guide"> + <span class="icon-chevron-down"></span> + Programmer Guide</a> + <ul class="nav nav-list"> + + <li> + + <a href="../docs/model-config.html" title="Model Configuration"> + <span class="none"></span> + Model Configuration</a> + </li> + + <li> + + <a href="../docs/neuralnet.html" title="Neural Network"> + <span class="none"></span> + Neural Network</a> + </li> + + <li> + + <a href="../docs/layer.html" title="Layer"> + <span class="none"></span> + Layer</a> + </li> + + <li> + + <a href="../docs/param.html" title="Param"> + <span class="none"></span> + Param</a> + </li> + </ul> + </li> + + <li> + + <a href="../docs/distributed-training.html" title="Distributed Training"> + <span class="icon-chevron-down"></span> + Distributed Training</a> + <ul class="nav nav-list"> + + <li> + + <a href="../docs/architecture.html" title="System Architecture"> + <span class="none"></span> + System Architecture</a> + </li> + + <li> + + <a href="../docs/frameworks.html" title="Frameworks"> + <span class="none"></span> + Frameworks</a> + </li> + + <li> + + <a href="../docs/communication.html" title="Communication"> + <span class="none"></span> + Communication</a> + </li> + </ul> + </li> + + <li> + + <a href="../docs/data.html" title="Data Preparation"> + <span class="none"></span> + Data Preparation</a> + </li> + + <li> + + <a href="../docs/checkpoint.html" title="Checkpoint"> + <span class="none"></span> + Checkpoint</a> + </li> + + <li> + + <a href="../docs/examples.html" title="Examples"> + <span class="none"></span> + Examples</a> + </li> + + <li> + + <a href="../docs/debug.html" title="Debug"> + <span class="none"></span> + Debug</a> + </li> + <li class="nav-header">Development</li> + + <li> + + <a href="../develop/schedule.html" title="Schedule"> + <span class="none"></span> + Schedule</a> + </li> + + <li> + + <a href="../develop/how-contribute.html" title="How to Contribute"> + <span class="icon-chevron-down"></span> + How to Contribute</a> + <ul class="nav nav-list"> + + <li> + + <a href="../develop/contribute-code.html" title="Code"> + <span class="none"></span> + Code</a> + </li> + + <li> + + <a href="../develop/contribute-docs.html" title="Documentation"> + <span class="none"></span> + Documentation</a> + </li> + </ul> + </li> + <li class="nav-header">Community</li> + + <li> + + <a href="../community/source-repository.html" title="Source Repository"> + <span class="none"></span> + Source Repository</a> + </li> + + <li> + + <a href="../community/mail-lists.html" title="Mailing Lists"> + <span class="none"></span> + Mailing Lists</a> + </li> + + <li> + + <a href="../community/issue-tracking.html" title="Issue Tracking"> + <span class="none"></span> + Issue Tracking</a> + </li> + + <li> + + <a href="../community/team-list.html" title="SINGA Team"> + <span class="none"></span> + SINGA Team</a> + </li> + <li class="nav-header">External Links</li> + + <li> + + <a href="http://www.apache.org/" class="externalLink" title="Apache Software Foundation"> + <span class="none"></span> + Apache Software Foundation</a> + </li> + + <li> + + <a href="http://www.comp.nus.edu.sg/~dbsystem/singa/" class="externalLink" title="NUS School of Computing"> + <span class="none"></span> + NUS School of Computing</a> + </li> + </ul> + + + + <hr /> + + <div id="poweredBy"> + <div class="clear"></div> + <div class="clear"></div> + <div class="clear"></div> + <div class="clear"></div> + <a href="http://incubator.apache.org" title="apache-incubator" class="builtBy"> + <img class="builtBy" alt="Apache Incubator" src="http://incubator.apache.org/images/egg-logo.png" /> + </a> + </div> + </div> + </div> + + + <div id="bodyColumn" class="span10" > + + <h1>Train-One-Batch</h1> +<p>For each SGD iteration, every worker calls the <tt>TrainOneBatch</tt> function to compute gradients of parameters associated with local layers (i.e., layers dispatched to it). SINGA has implemented two algorithms for the <tt>TrainOneBatch</tt> function. Users select the corresponding algorithm for their model in the configuration.</p> +<div class="section"> +<h2><a name="Basic_user_guide"></a>Basic user guide</h2> +<div class="section"> +<h3><a name="Back-propagation"></a>Back-propagation</h3> +<p><a class="externalLink" href="http://yann.lecun.com/exdb/publis/pdf/lecun-98b.pdf">BP algorithm</a> is used for computing gradients of feed-forward models, e.g., <a class="externalLink" href="http://singa.incubator.apache.org/docs/cnn">CNN</a> and <a class="externalLink" href="http://singa.incubator.apache.org/docs/mlp">MLP</a>, and <a class="externalLink" href="http://singa.incubator.apache.org/docs/rnn">RNN</a> models in SINGA.</p> + +<div class="source"> +<div class="source"><pre class="prettyprint"># in job.conf +alg: kBP +</pre></div></div> +<p>To use the BP algorithm for the <tt>TrainOneBatch</tt> function, users just simply configure the <tt>alg</tt> field with <tt>kBP</tt>. If a neural net contains user-defined layers, these layers must be implemented properly be to consistent with the implementation of the BP algorithm in SINGA (see below).</p></div> +<div class="section"> +<h3><a name="Contrastive_Divergence"></a>Contrastive Divergence</h3> +<p><a class="externalLink" href="http://www.cs.toronto.edu/~fritz/absps/nccd.pdf">CD algorithm</a> is used for computing gradients of energy models like RBM.</p> + +<div class="source"> +<div class="source"><pre class="prettyprint"># job.conf +alg: kCD +cd_conf { + cd_k: 2 +} +</pre></div></div> +<p>To use the CD algorithm for the <tt>TrainOneBatch</tt> function, users just configure the <tt>alg</tt> field to <tt>kCD</tt>. Uses can also configure the Gibbs sampling steps in the CD algorthm through the <tt>cd_k</tt> field. By default, it is set to 1.</p></div></div> +<div class="section"> +<h2><a name="Advanced_user_guide"></a>Advanced user guide</h2> +<div class="section"> +<h3><a name="Implementation_of_BP"></a>Implementation of BP</h3> +<p>The BP algorithm is implemented in SINGA following the below pseudo code,</p> + +<div class="source"> +<div class="source"><pre class="prettyprint">BPTrainOnebatch(step, net) { + // forward propagate + foreach layer in net.local_layers() { + if IsBridgeDstLayer(layer) + recv data from the src layer (i.e., BridgeSrcLayer) + foreach param in layer.params() + Collect(param) // recv response from servers for last update + + layer.ComputeFeature(kForward) + + if IsBridgeSrcLayer(layer) + send layer.data_ to dst layer + } + // backward propagate + foreach layer in reverse(net.local_layers) { + if IsBridgeSrcLayer(layer) + recv gradient from the dst layer (i.e., BridgeDstLayer) + recv response from servers for last update + + layer.ComputeGradient() + foreach param in layer.params() + Update(step, param) // send param.grad_ to servers + + if IsBridgeDstLayer(layer) + send layer.grad_ to src layer + } +} +</pre></div></div> +<p>It forwards features through all local layers (can be checked by layer partition ID and worker ID) and backwards gradients in the reverse order. <a class="externalLink" href="http://singa.incubator.apache.org/docs/layer/#bridgesrclayer--bridgedstlayer">BridgeSrcLayer</a> (resp. <tt>BridgeDstLayer</tt>) will be blocked until the feature (resp. gradient) from the source (resp. destination) layer comes. Parameter gradients are sent to servers via <tt>Update</tt> function. Updated parameters are collected via <tt>Collect</tt> function, which will be blocked until the parameter is updated. <a class="externalLink" href="http://singa.incubator.apache.org/docs/param">Param</a> objects have versions, which can be used to check whether the <tt>Param</tt> objects have been updated or not.</p> +<p>Since RNN models are unrolled into feed-forward models, users need to implement the forward propagation in the recurrent layer’s <tt>ComputeFeature</tt> function, and implement the backward propagation in the recurrent layer’s <tt>ComputeGradient</tt> function. As a result, the whole <tt>TrainOneBatch</tt> runs <a class="externalLink" href="https://en.wikipedia.org/wiki/Backpropagation_through_time">back-propagation through time (BPTT)</a> algorithm.</p></div> +<div class="section"> +<h3><a name="Implementation_of_CD"></a>Implementation of CD</h3> +<p>The CD algorithm is implemented in SINGA following the below pseudo code,</p> + +<div class="source"> +<div class="source"><pre class="prettyprint">CDTrainOneBatch(step, net) { + # positive phase + foreach layer in net.local_layers() + if IsBridgeDstLayer(layer) + recv positive phase data from the src layer (i.e., BridgeSrcLayer) + foreach param in layer.params() + Collect(param) // recv response from servers for last update + layer.ComputeFeature(kPositive) + if IsBridgeSrcLayer(layer) + send positive phase data to dst layer + + # negative phase + foreach gibbs in [0...layer_proto_.cd_k] + foreach layer in net.local_layers() + if IsBridgeDstLayer(layer) + recv negative phase data from the src layer (i.e., BridgeSrcLayer) + layer.ComputeFeature(kPositive) + if IsBridgeSrcLayer(layer) + send negative phase data to dst layer + + foreach layer in net.local_layers() + layer.ComputeGradient() + foreach param in layer.params + Update(param) +} +</pre></div></div> +<p>Parameter gradients are computed after the positive phase and negative phase.</p></div> +<div class="section"> +<h3><a name="Implementing_a_new_algorithm"></a>Implementing a new algorithm</h3> +<p>SINGA implements BP and CD by creating two subclasses of the <a href="api/classsinga_1_1Worker.html">Worker</a> class: <a href="api/classsinga_1_1BPWorker.html">BPWorker</a>’s <tt>TrainOneBatch</tt> function implements the BP algorithm; <a href="api/classsinga_1_1CDWorker.html">CDWorker</a>’s <tt>TrainOneBatch</tt> function implements the CD algorithm. To implement a new algorithm for the <tt>TrainOneBatch</tt> function, users need to create a new subclass of the <tt>Worker</tt>, e.g.,</p> + +<div class="source"> +<div class="source"><pre class="prettyprint">class FooWorker : public Worker { + void TrainOneBatch(int step, shared_ptr<NeuralNet> net, Metric* perf) override; + void TestOneBatch(int step, Phase phase, shared_ptr<NeuralNet> net, Metric* perf) override; +}; +</pre></div></div> +<p>The <tt>FooWorker</tt> must implement the above two functions for training one mini-batch and testing one mini-batch. The <tt>perf</tt> argument is for collecting training or testing performance, e.g., the objective loss or accuracy. It is passed to the <tt>ComputeFeature</tt> function of each layer.</p> +<p>Users can define some fields for users to configure</p> + +<div class="source"> +<div class="source"><pre class="prettyprint"># in user.proto +message FooWorkerProto { + optional int32 b = 1; +} + +extend JobProto { + optional FooWorkerProto foo_conf = 101; +} + +# in job.proto +JobProto { + ... + extension 101..max; +} +</pre></div></div> +<p>It is similar as <a class="externalLink" href="http://singa.incubator.apache.org/docs/layer/#implementing-a-new-layer-subclass">adding configuration fields for a new layer</a>.</p> +<p>To use <tt>FooWorker</tt>, users need to register it in the <a class="externalLink" href="http://singa.incubator.apache.org/docs/programming-guide">main.cc</a> and configure the <tt>alg</tt> and <tt>foo_conf</tt> fields,</p> + +<div class="source"> +<div class="source"><pre class="prettyprint"># in main.cc +const int kFoo = 3; // worker ID, must be different to that of CDWorker and BPWorker +driver.RegisterWorker<FooWorker>(kFoo); + +# in job.conf +... +alg: 3 +[foo_conf] { + b = 4; +} +</pre></div></div></div></div> + </div> + </div> + </div> + + <hr/> + + <footer> + <div class="container-fluid"> + <div class="row-fluid"> + +<p>Copyright © 2015 The Apache Software Foundation. All rights reserved. Apache Singa, Apache, the Apache feather logo, and the Apache Singa project logos are trademarks of The Apache Software Foundation. All other marks mentioned may be trademarks or registered trademarks of their respective owners.</p> + </div> + + + </div> + </footer> + </body> +</html> Added: websites/staging/singa/trunk/content/docs/updater.html ============================================================================== --- websites/staging/singa/trunk/content/docs/updater.html (added) +++ websites/staging/singa/trunk/content/docs/updater.html Wed Sep 2 10:31:57 2015 @@ -0,0 +1,717 @@ +<!DOCTYPE html> +<!-- + | Generated by Apache Maven Doxia at 2015-09-02 + | Rendered using Apache Maven Fluido Skin 1.4 +--> +<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"> + <head> + <meta charset="UTF-8" /> + <meta name="viewport" content="width=device-width, initial-scale=1.0" /> + <meta name="Date-Revision-yyyymmdd" content="20150902" /> + <meta http-equiv="Content-Language" content="en" /> + <title>Apache SINGA – Updater</title> + <link rel="stylesheet" href="../css/apache-maven-fluido-1.4.min.css" /> + <link rel="stylesheet" href="../css/site.css" /> + <link rel="stylesheet" href="../css/print.css" media="print" /> + + + + + + <script type="text/javascript" src="../js/apache-maven-fluido-1.4.min.js"></script> + + + </head> + <body class="topBarEnabled"> + + + + + + + <a href="https://github.com/apache/incubator-singa"> + <img style="position: absolute; top: 0; right: 0; border: 0; z-index: 10000;" + src="https://s3.amazonaws.com/github/ribbons/forkme_right_orange_ff7600.png" + alt="Fork me on GitHub"> + </a> + + + + + + <div id="topbar" class="navbar navbar-fixed-top navbar-inverse"> + <div class="navbar-inner"> + <div class="container-fluid"> + <a data-target=".nav-collapse" data-toggle="collapse" class="btn btn-navbar"> + <span class="icon-bar"></span> + <span class="icon-bar"></span> + <span class="icon-bar"></span> + </a> + + <ul class="nav"> + <li class="dropdown"> + <a href="#" class="dropdown-toggle" data-toggle="dropdown">Apache SINGA <b class="caret"></b></a> + <ul class="dropdown-menu"> + + <li> <a href="../index.html" title="Welcome">Welcome</a> +</li> + + <li> <a href="../introduction.html" title="Introduction">Introduction</a> +</li> + + <li> <a href="../quick-start.html" title="Quick Start">Quick Start</a> +</li> + </ul> + </li> + <li class="dropdown"> + <a href="#" class="dropdown-toggle" data-toggle="dropdown">Documentaion <b class="caret"></b></a> + <ul class="dropdown-menu"> + + <li> <a href="../docs/installation.html" title="Installation">Installation</a> +</li> + + <li class="dropdown-submenu"> + <a href="../docs/programmer-guide.html" title="Programmer Guide">Programmer Guide</a> + <ul class="dropdown-menu"> + <li> <a href="../docs/model-config.html" title="Model Configuration">Model Configuration</a> +</li> + <li> <a href="../docs/neuralnet.html" title="Neural Network">Neural Network</a> +</li> + <li> <a href="../docs/layer.html" title="Layer">Layer</a> +</li> + <li> <a href="../docs/param.html" title="Param">Param</a> +</li> + </ul> + </li> + + <li class="dropdown-submenu"> + <a href="../docs/distributed-training.html" title="Distributed Training">Distributed Training</a> + <ul class="dropdown-menu"> + <li> <a href="../docs/architecture.html" title="System Architecture">System Architecture</a> +</li> + <li> <a href="../docs/frameworks.html" title="Frameworks">Frameworks</a> +</li> + <li> <a href="../docs/communication.html" title="Communication">Communication</a> +</li> + </ul> + </li> + + <li> <a href="../docs/data.html" title="Data Preparation">Data Preparation</a> +</li> + + <li> <a href="../docs/checkpoint.html" title="Checkpoint">Checkpoint</a> +</li> + + <li> <a href="../docs/examples.html" title="Examples">Examples</a> +</li> + + <li> <a href="../docs/debug.html" title="Debug">Debug</a> +</li> + </ul> + </li> + <li class="dropdown"> + <a href="#" class="dropdown-toggle" data-toggle="dropdown">Development <b class="caret"></b></a> + <ul class="dropdown-menu"> + + <li> <a href="../develop/schedule.html" title="Schedule">Schedule</a> +</li> + + <li class="dropdown-submenu"> + <a href="../develop/how-contribute.html" title="How to Contribute">How to Contribute</a> + <ul class="dropdown-menu"> + <li> <a href="../develop/contribute-code.html" title="Code">Code</a> +</li> + <li> <a href="../develop/contribute-docs.html" title="Documentation">Documentation</a> +</li> + </ul> + </li> + </ul> + </li> + <li class="dropdown"> + <a href="#" class="dropdown-toggle" data-toggle="dropdown">Community <b class="caret"></b></a> + <ul class="dropdown-menu"> + + <li> <a href="../community/source-repository.html" title="Source Repository">Source Repository</a> +</li> + + <li> <a href="../community/mail-lists.html" title="Mailing Lists">Mailing Lists</a> +</li> + + <li> <a href="../community/issue-tracking.html" title="Issue Tracking">Issue Tracking</a> +</li> + + <li> <a href="../community/team-list.html" title="SINGA Team">SINGA Team</a> +</li> + </ul> + </li> + <li class="dropdown"> + <a href="#" class="dropdown-toggle" data-toggle="dropdown">External Links <b class="caret"></b></a> + <ul class="dropdown-menu"> + + <li> <a href="http://www.apache.org/" title="Apache Software Foundation">Apache Software Foundation</a> +</li> + + <li> <a href="http://www.comp.nus.edu.sg/~dbsystem/singa/" title="NUS School of Computing">NUS School of Computing</a> +</li> + </ul> + </li> + </ul> + + + + + </div> + + </div> + </div> + </div> + + <div class="container-fluid"> + <div id="banner"> + <div class="pull-left"> + <a href="../index.html" id="bannerLeft" title="Apache SINGA"> + <img src="../images/singa-logo.png" alt="Apache SINGA"/> + </a> + </div> + <div class="pull-right"> <div id="bannerRight"> + <img src="../images/singa-title.png" alt="Apache SINGA"/> + </div> + </div> + <div class="clear"><hr/></div> + </div> + + <div id="breadcrumbs"> + <ul class="breadcrumb"> + + + <li class=""> + <a href="../index.html" title="Apache SINGA"> + Apache SINGA</a> + <span class="divider">/</span> + </li> + <li class="active ">Updater</li> + + + + + </ul> + </div> + + + <div class="row-fluid"> + <div id="leftColumn" class="span2"> + <div class="well sidebar-nav"> + + + <ul class="nav nav-list"> + <li class="nav-header">Apache SINGA</li> + + <li> + + <a href="../index.html" title="Welcome"> + <span class="none"></span> + Welcome</a> + </li> + + <li> + + <a href="../introduction.html" title="Introduction"> + <span class="none"></span> + Introduction</a> + </li> + + <li> + + <a href="../quick-start.html" title="Quick Start"> + <span class="none"></span> + Quick Start</a> + </li> + <li class="nav-header">Documentaion</li> + + <li> + + <a href="../docs/installation.html" title="Installation"> + <span class="none"></span> + Installation</a> + </li> + + <li> + + <a href="../docs/programmer-guide.html" title="Programmer Guide"> + <span class="icon-chevron-down"></span> + Programmer Guide</a> + <ul class="nav nav-list"> + + <li> + + <a href="../docs/model-config.html" title="Model Configuration"> + <span class="none"></span> + Model Configuration</a> + </li> + + <li> + + <a href="../docs/neuralnet.html" title="Neural Network"> + <span class="none"></span> + Neural Network</a> + </li> + + <li> + + <a href="../docs/layer.html" title="Layer"> + <span class="none"></span> + Layer</a> + </li> + + <li> + + <a href="../docs/param.html" title="Param"> + <span class="none"></span> + Param</a> + </li> + </ul> + </li> + + <li> + + <a href="../docs/distributed-training.html" title="Distributed Training"> + <span class="icon-chevron-down"></span> + Distributed Training</a> + <ul class="nav nav-list"> + + <li> + + <a href="../docs/architecture.html" title="System Architecture"> + <span class="none"></span> + System Architecture</a> + </li> + + <li> + + <a href="../docs/frameworks.html" title="Frameworks"> + <span class="none"></span> + Frameworks</a> + </li> + + <li> + + <a href="../docs/communication.html" title="Communication"> + <span class="none"></span> + Communication</a> + </li> + </ul> + </li> + + <li> + + <a href="../docs/data.html" title="Data Preparation"> + <span class="none"></span> + Data Preparation</a> + </li> + + <li> + + <a href="../docs/checkpoint.html" title="Checkpoint"> + <span class="none"></span> + Checkpoint</a> + </li> + + <li> + + <a href="../docs/examples.html" title="Examples"> + <span class="none"></span> + Examples</a> + </li> + + <li> + + <a href="../docs/debug.html" title="Debug"> + <span class="none"></span> + Debug</a> + </li> + <li class="nav-header">Development</li> + + <li> + + <a href="../develop/schedule.html" title="Schedule"> + <span class="none"></span> + Schedule</a> + </li> + + <li> + + <a href="../develop/how-contribute.html" title="How to Contribute"> + <span class="icon-chevron-down"></span> + How to Contribute</a> + <ul class="nav nav-list"> + + <li> + + <a href="../develop/contribute-code.html" title="Code"> + <span class="none"></span> + Code</a> + </li> + + <li> + + <a href="../develop/contribute-docs.html" title="Documentation"> + <span class="none"></span> + Documentation</a> + </li> + </ul> + </li> + <li class="nav-header">Community</li> + + <li> + + <a href="../community/source-repository.html" title="Source Repository"> + <span class="none"></span> + Source Repository</a> + </li> + + <li> + + <a href="../community/mail-lists.html" title="Mailing Lists"> + <span class="none"></span> + Mailing Lists</a> + </li> + + <li> + + <a href="../community/issue-tracking.html" title="Issue Tracking"> + <span class="none"></span> + Issue Tracking</a> + </li> + + <li> + + <a href="../community/team-list.html" title="SINGA Team"> + <span class="none"></span> + SINGA Team</a> + </li> + <li class="nav-header">External Links</li> + + <li> + + <a href="http://www.apache.org/" class="externalLink" title="Apache Software Foundation"> + <span class="none"></span> + Apache Software Foundation</a> + </li> + + <li> + + <a href="http://www.comp.nus.edu.sg/~dbsystem/singa/" class="externalLink" title="NUS School of Computing"> + <span class="none"></span> + NUS School of Computing</a> + </li> + </ul> + + + + <hr /> + + <div id="poweredBy"> + <div class="clear"></div> + <div class="clear"></div> + <div class="clear"></div> + <div class="clear"></div> + <a href="http://incubator.apache.org" title="apache-incubator" class="builtBy"> + <img class="builtBy" alt="Apache Incubator" src="http://incubator.apache.org/images/egg-logo.png" /> + </a> + </div> + </div> + </div> + + + <div id="bodyColumn" class="span10" > + + <h1>Updater</h1> +<p>Every server in SINGA has an <a href="api/classsinga_1_1Updater.html">Updater</a> instance that updates parameters based on gradients. In this page, the <i>Basic user guide</i> describes the configuration of an updater. The <i>Advanced user guide</i> present details on how to implement a new updater and a new learning rate changing method.</p> +<div class="section"> +<h2><a name="Basic_user_guide"></a>Basic user guide</h2> +<p>There are many different parameter updating protocols (i.e., subclasses of <tt>Updater</tt>). They share some configuration fields like</p> + +<ul> + +<li><tt>type</tt>, an integer for identifying an updater;</li> + +<li><tt>learning_rate</tt>, configuration for the <a class="externalLink" href="http://singa.incubator.apache.org/api/classsinga_1_1LRGenerator.html">LRGenerator</a> which controls the learning rate.</li> + +<li><tt>weight_decay</tt>, the co-efficient for <a class="externalLink" href="http://deeplearning.net/tutorial/gettingstarted.html#regularization">L2 * regularization</a>.</li> + +<li><a class="externalLink" href="http://ufldl.stanford.edu/tutorial/supervised/OptimizationStochasticGradientDescent/">momentum</a>.</li> +</ul> +<p>If you are not familiar with the above terms, you can get their meanings in <a class="externalLink" href="http://cs231n.github.io/neural-networks-3/#update">this page provided by Karpathy</a>.</p> +<div class="section"> +<h3><a name="Configuration_of_built-in_updater_classes"></a>Configuration of built-in updater classes</h3> +<div class="section"> +<h4><a name="Updater"></a>Updater</h4> +<p>The base <tt>Updater</tt> implements the <a class="externalLink" href="http://cs231n.github.io/neural-networks-3/#sgd">vanilla SGD algorithm</a>. Its configuration type is <tt>kSGD</tt>. Users need to configure at least the <tt>learning_rate</tt> field. <tt>momentum</tt> and <tt>weight_decay</tt> are optional fields.</p> + +<div class="source"> +<div class="source"><pre class="prettyprint">updater{ + type: kSGD + momentum: float + weight_decay: float + learning_rate { + + } +} +</pre></div></div></div> +<div class="section"> +<h4><a name="AdaGradUpdater"></a>AdaGradUpdater</h4> +<p>It inherits the base <tt>Updater</tt> to implement the <a class="externalLink" href="http://www.magicbroom.info/Papers/DuchiHaSi10.pdf">AdaGrad</a> algorithm. Its type is <tt>kAdaGrad</tt>. <tt>AdaGradUpdater</tt> is configured similar to <tt>Updater</tt> except that <tt>momentum</tt> is not used.</p></div> +<div class="section"> +<h4><a name="NesterovUpdater"></a>NesterovUpdater</h4> +<p>It inherits the base <tt>Updater</tt> to implements the <a class="externalLink" href="http://arxiv.org/pdf/1212.0901v2.pdf">Nesterov</a> (section 3.5) updating protocol. Its type is <tt>kNesterov</tt>. <tt>learning_rate</tt> and <tt>momentum</tt> must be configured. <tt>weight_decay</tt> is an optional configuration field.</p></div> +<div class="section"> +<h4><a name="RMSPropUpdater"></a>RMSPropUpdater</h4> +<p>It inherits the base <tt>Updater</tt> to implements the <a class="externalLink" href="http://cs231n.github.io/neural-networks-3/#sgd">RMSProp algorithm</a> proposed by <a class="externalLink" href="http://www.cs.toronto.edu/%7Etijmen/csc321/slides/lecture_slides_lec6.pdf">Hinton</a>(slide 29). Its type is <tt>kRMSProp</tt>.</p> + +<div class="source"> +<div class="source"><pre class="prettyprint">updater { + type: kRMSProp + rmsprop_conf { + rho: float # [0,1] + } +} +</pre></div></div></div></div> +<div class="section"> +<h3><a name="Configuration_of_learning_rate"></a>Configuration of learning rate</h3> +<p>The <tt>learning_rate</tt> field is configured as,</p> + +<div class="source"> +<div class="source"><pre class="prettyprint">learning_rate { + type: ChangeMethod + base_lr: float # base/initial learning rate + ... # fields to a specific changing method +} +</pre></div></div> +<p>The common fields include <tt>type</tt> and <tt>base_lr</tt>. SINGA provides the following <tt>ChangeMethod</tt>s.</p> +<div class="section"> +<h4><a name="kFixed"></a>kFixed</h4> +<p>The <tt>base_lr</tt> is used for all steps.</p></div> +<div class="section"> +<h4><a name="kLinear"></a>kLinear</h4> +<p>The updater should be configured like</p> + +<div class="source"> +<div class="source"><pre class="prettyprint">learning_rate { + base_lr: float + linear_conf { + freq: int + final_lr: float + } +} +</pre></div></div> +<p>Linear interpolation is used to change the learning rate,</p> + +<div class="source"> +<div class="source"><pre class="prettyprint">lr = (1 - step / freq) * base_lr + (step / freq) * final_lr +</pre></div></div></div> +<div class="section"> +<h4><a name="kExponential"></a>kExponential</h4> +<p>The udapter should be configured like</p> + +<div class="source"> +<div class="source"><pre class="prettyprint">learning_rate { + base_lr: float + exponential_conf { + freq: int + } +} +</pre></div></div> +<p>The learning rate for <tt>step</tt> is</p> + +<div class="source"> +<div class="source"><pre class="prettyprint">lr = base_lr / 2^(step / freq) +</pre></div></div></div> +<div class="section"> +<h4><a name="kInverseT"></a>kInverseT</h4> +<p>The updater should be configured like</p> + +<div class="source"> +<div class="source"><pre class="prettyprint">learning_rate { + base_lr: float + inverset_conf { + final_lr: float + } +} +</pre></div></div> +<p>The learning rate for <tt>step</tt> is</p> + +<div class="source"> +<div class="source"><pre class="prettyprint">lr = base_lr / (1 + step / final_lr) +</pre></div></div></div> +<div class="section"> +<h4><a name="kInverse"></a>kInverse</h4> +<p>The updater should be configured like</p> + +<div class="source"> +<div class="source"><pre class="prettyprint">learning_rate { + base_lr: float + inverse_conf { + gamma: float + pow: float + } +} +</pre></div></div> +<p>The learning rate for <tt>step</tt> is</p> + +<div class="source"> +<div class="source"><pre class="prettyprint">lr = base_lr * (1 + gamma * setp)^(-pow) +</pre></div></div></div> +<div class="section"> +<h4><a name="kStep"></a>kStep</h4> +<p>The updater should be configured like</p> + +<div class="source"> +<div class="source"><pre class="prettyprint">learning_rate { + base_lr : float + step_conf { + change_freq: int + gamma: float + } +} +</pre></div></div> +<p>The learning rate for <tt>step</tt> is</p> + +<div class="source"> +<div class="source"><pre class="prettyprint">lr = base_lr * gamma^ (step / change_freq) +</pre></div></div></div> +<div class="section"> +<h4><a name="kFixedStep"></a>kFixedStep</h4> +<p>The updater should be configured like</p> + +<div class="source"> +<div class="source"><pre class="prettyprint">learning_rate { + fixedstep_conf { + step: int + step_lr: float + + step: int + step_lr: float + + ... + } +} +</pre></div></div> +<p>Denote the i-th tuple as (step[i], step_lr[i]), then the learning rate for <tt>step</tt> is,</p> + +<div class="source"> +<div class="source"><pre class="prettyprint">step_lr[k] +</pre></div></div> +<p>where step[k] is the smallest number that is larger than <tt>step</tt>.</p></div></div></div> +<div class="section"> +<h2><a name="Advanced_user_guide"></a>Advanced user guide</h2> +<div class="section"> +<h3><a name="Implementing_a_new_Update_subclass"></a>Implementing a new Update subclass</h3> +<p>The base Updater class has one virtual function,</p> + +<div class="source"> +<div class="source"><pre class="prettyprint">class Updater{ + public: + virtual void Update(int step, Param* param, float grad_scale = 1.0f) = 0; + + protected: + UpdaterProto proto_; + LRGenerator lr_gen_; +}; +</pre></div></div> +<p>It updates the values of the <tt>param</tt> based on its gradients. The <tt>step</tt> argument is for deciding the learning rate which may change through time (step). <tt>grad_scale</tt> scales the original gradient values. This function is called by servers once it receives all gradients for the same <tt>Param</tt> object.</p> +<p>To implement a new Updater subclass, users must override the <tt>Update</tt> function.</p> + +<div class="source"> +<div class="source"><pre class="prettyprint">class FooUpdater : public Updater { + void Update(int step, Param* param, float grad_scale = 1.0f) override; +}; +</pre></div></div> +<p>Configuration of this new updater can be declared similar to that of a new layer,</p> + +<div class="source"> +<div class="source"><pre class="prettyprint"># in user.proto +FooUpdaterProto { + optional int32 c = 1; +} + +extend UpdaterProto { + optional FooUpdaterProto fooupdater_conf= 101; +} +</pre></div></div> +<p>The new updater should be registered in the <a class="externalLink" href="http://singa.incubator.apache.org/docs/programming-guide">main function</a></p> + +<div class="source"> +<div class="source"><pre class="prettyprint">driver.RegisterUpdater<FooUpdater>("FooUpdater"); +</pre></div></div> +<p>Users can then configure the job as</p> + +<div class="source"> +<div class="source"><pre class="prettyprint"># in job.conf +updater { + user_type: "FooUpdater" # must use user_type with the same string identifier as the one used for registration + fooupdater_conf { + c : 20; + } +} +</pre></div></div></div> +<div class="section"> +<h3><a name="Implementing_a_new_LRGenerator_subclass"></a>Implementing a new LRGenerator subclass</h3> +<p>The base <tt>LRGenerator</tt> is declared as,</p> + +<div class="source"> +<div class="source"><pre class="prettyprint">virtual float Get(int step); +</pre></div></div> +<p>To implement a subclass, e.g., <tt>FooLRGen</tt>, users should declare it like</p> + +<div class="source"> +<div class="source"><pre class="prettyprint">class FooLRGen : public LRGenerator { + public: + float Get(int step) override; +}; +</pre></div></div> +<p>Configuration of <tt>FooLRGen</tt> can be defined using a protocol message,</p> + +<div class="source"> +<div class="source"><pre class="prettyprint"># in user.proto +message FooLRProto { + ... +} + +extend LRGenProto { + optional FooLRProto foolr_conf = 101; +} +</pre></div></div> +<p>The configuration is then like,</p> + +<div class="source"> +<div class="source"><pre class="prettyprint">learning_rate { + user_type : "FooLR" # must use user_type with the same string identifier as the one used for registration + base_lr: float + foolr_conf { + ... + } +} +</pre></div></div> +<p>Users have to register this subclass in the main function,</p> + +<div class="source"> +<div class="source"><pre class="prettyprint"> driver.RegisterLRGenerator<FooLRGen>("FooLR") +</pre></div></div></div></div> + </div> + </div> + </div> + + <hr/> + + <footer> + <div class="container-fluid"> + <div class="row-fluid"> + +<p>Copyright © 2015 The Apache Software Foundation. All rights reserved. Apache Singa, Apache, the Apache feather logo, and the Apache Singa project logos are trademarks of The Apache Software Foundation. All other marks mentioned may be trademarks or registered trademarks of their respective owners.</p> + </div> + + + </div> + </footer> + </body> +</html> Modified: websites/staging/singa/trunk/content/index.html ============================================================================== --- websites/staging/singa/trunk/content/index.html (original) +++ websites/staging/singa/trunk/content/index.html Wed Sep 2 10:31:57 2015 @@ -1,13 +1,13 @@ <!DOCTYPE html> <!-- - | Generated by Apache Maven Doxia at 2015-08-17 + | Generated by Apache Maven Doxia at 2015-09-02 | Rendered using Apache Maven Fluido Skin 1.4 --> <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"> <head> <meta charset="UTF-8" /> <meta name="viewport" content="width=device-width, initial-scale=1.0" /> - <meta name="Date-Revision-yyyymmdd" content="20150817" /> + <meta name="Date-Revision-yyyymmdd" content="20150902" /> <meta http-equiv="Content-Language" content="en" /> <title>Apache SINGA – Welcome to Apache SINGA</title> <link rel="stylesheet" href="./css/apache-maven-fluido-1.4.min.css" /> Modified: websites/staging/singa/trunk/content/introduction.html ============================================================================== --- websites/staging/singa/trunk/content/introduction.html (original) +++ websites/staging/singa/trunk/content/introduction.html Wed Sep 2 10:31:57 2015 @@ -1,13 +1,13 @@ <!DOCTYPE html> <!-- - | Generated by Apache Maven Doxia at 2015-08-17 + | Generated by Apache Maven Doxia at 2015-09-02 | Rendered using Apache Maven Fluido Skin 1.4 --> <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"> <head> <meta charset="UTF-8" /> <meta name="viewport" content="width=device-width, initial-scale=1.0" /> - <meta name="Date-Revision-yyyymmdd" content="20150817" /> + <meta name="Date-Revision-yyyymmdd" content="20150902" /> <meta http-equiv="Content-Language" content="en" /> <title>Apache SINGA – Introduction</title> <link rel="stylesheet" href="./css/apache-maven-fluido-1.4.min.css" /> Modified: websites/staging/singa/trunk/content/quick-start.html ============================================================================== --- websites/staging/singa/trunk/content/quick-start.html (original) +++ websites/staging/singa/trunk/content/quick-start.html Wed Sep 2 10:31:57 2015 @@ -1,13 +1,13 @@ <!DOCTYPE html> <!-- - | Generated by Apache Maven Doxia at 2015-08-17 + | Generated by Apache Maven Doxia at 2015-09-02 | Rendered using Apache Maven Fluido Skin 1.4 --> <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"> <head> <meta charset="UTF-8" /> <meta name="viewport" content="width=device-width, initial-scale=1.0" /> - <meta name="Date-Revision-yyyymmdd" content="20150817" /> + <meta name="Date-Revision-yyyymmdd" content="20150902" /> <meta http-equiv="Content-Language" content="en" /> <title>Apache SINGA – Quick Start</title> <link rel="stylesheet" href="./css/apache-maven-fluido-1.4.min.css" />
