Author: buildbot
Date: Tue Jul 28 12:16:16 2015
New Revision: 959888
Log:
Staging update by buildbot for singa
Modified:
websites/staging/singa/trunk/content/ (props changed)
websites/staging/singa/trunk/content/docs/data.html
websites/staging/singa/trunk/content/docs/neuralnet-partition.html
Propchange: websites/staging/singa/trunk/content/
------------------------------------------------------------------------------
--- cms:source-revision (original)
+++ cms:source-revision Tue Jul 28 12:16:16 2015
@@ -1 +1 @@
-1693074
+1693077
Modified: websites/staging/singa/trunk/content/docs/data.html
==============================================================================
--- websites/staging/singa/trunk/content/docs/data.html (original)
+++ websites/staging/singa/trunk/content/docs/data.html Tue Jul 28 12:16:16 2015
@@ -403,9 +403,114 @@
<div class="section">
<h2><a name="Data_Preparation"></a>Data Preparation</h2>
-<p>To submit a training job, users need to convert raw data (e.g., images,
text documents) into records that can be recognized by SINGA. SINGA uses a
DataLayer to load these records into memory and uses ParserLayer to parse
features (e.g., image pixels and labels) from these records. The records could
be organized and stored using many different ways, e.g., using a light
database, or a file or HDFS, as long as there is a corresponding DataLayer that
can load the records.</p>
+<p>To submit a training job, users need to convert raw data (e.g., images,
text documents) into records that can be recognized by SINGA. SINGA uses a
DataLayer to load these records into memory and uses ParserLayer to parse
features (e.g., image pixels and labels) from these records. The records could
be organized and stored using many different ways, e.g., a file, a light
database, or HDFS, as long as there is a corresponding DataLayer that can load
the records.</p>
<div class="section">
-<h3><a name="DataShard"></a>DataShard</h3></div>
+<h3><a name="DataShard"></a>DataShard</h3>
+<p>To create shard for your own data, users may need to implement or modify
the following files</p>
+
+<ul>
+
+<li>common.proto</li>
+
+<li>create_shard.cc</li>
+
+<li>Makefile</li>
+</ul>
+<p><b>1. Define record</b></p>
+<p>Record class is inherited from Message class whose format follows Google
protocol buffers. Please refer to the <a class="externalLink"
href="https://developers.google.com/protocol-buffers/docs/cpptutorial">Tutorial</a>.
</p>
+<p>Your record will be defined in a file,
SINGAfolder/src/proto/common.proto</p>
+<p>(a) Define the record</p>
+
+<div class="source">
+<div class="source"><pre class="prettyprint">message UserRecord {
+ repeated int userVAR1 = 1; // unique id
+ optional string userVAR2 = 2; // unique id
+ ...
+}
+</pre></div></div>
+<p>(b) Declare user own record in Record</p>
+
+<div class="source">
+<div class="source"><pre class="prettyprint">message Record {
+ optional UserRecord user_record = 1; // unique id
+ ...
+}
+</pre></div></div>
+<p>(c) Compile SINGA</p>
+
+<div class="source">
+<div class="source"><pre class="prettyprint">cd SINGAfolder
+./configure
+make
+</pre></div></div>
+<p><b>2. Create shard</b></p>
+<p>(a) Create a folder for dataset, e.g., we call it
“USERDATAfolder”.</p>
+<p>(b) Source files for creating shard will be in
SINGAfolder/USERDATAfolder/</p>
+
+<ul>
+
+<li>For example of RNNLM, create_shard.cc is in SINGAfolder/examples/rnnlm</li>
+</ul>
+<p>(c) Create shard</p>
+
+<div class="source">
+<div class="source"><pre class="prettyprint">singa::DataShard myShard(
outputpath, mode);
+</pre></div></div>
+
+<ul>
+
+<li><tt>string outputpath</tt>, where user wants to create shard.</li>
+
+<li><tt>int mode := kRead | kCreate | kAppend</tt>, is defined in
SINGAfolder/include/utils/data_shard.h</li>
+</ul>
+<p><b>3. Store record into shard</b></p>
+<p>(a) xxx</p>
+
+<div class="source">
+<div class="source"><pre class="prettyprint">singa::Record record;
+singa::UserRecord *myRecord = record.mutable_user_record();
+</pre></div></div>
+<p><tt>mutable_user_record()</tt> method is automatically generated after
compiling SINGA at Step 1-(c).</p>
+<p>(b) Set/Add values into the record</p>
+
+<div class="source">
+<div class="source"><pre class="prettyprint">myRecord->add_userVAR1(
int_val );
+myRecord->set_userVAR2( string_val );
+</pre></div></div>
+<p>(c) Store the record to shard</p>
+
+<div class="source">
+<div class="source"><pre class="prettyprint">myShard.Insert( key, myRecord );
+</pre></div></div>
+
+<ul>
+
+<li><tt>String key</tt>, will be a unique id for a message</li>
+</ul>
+<p><b>Example of RNNLM</b></p>
+<p>You can refer to RNNLM example at SINGAfolder/example/rnnlm/</p>
+
+<div class="source">
+<div class="source"><pre class="prettyprint">message SingleWordRecord {
+ optional string word = 1;
+ optional int32 word_index = 2;
+ optional int32 class_index =3;`
+}
+
+message Record {
+ optional SingleWordRecord word_record = 4;
+}
+
+make download
+to download raw data from https://www.rnnlm.org
+</pre></div></div>
+<p>In this example, rnnlm-0.4b is used.</p>
+
+<div class="source">
+<div class="source"><pre class="prettyprint">make create
+</pre></div></div>
+<p>to process input text file, create records, and store it into shard</p>
+<p>We create 3 shards for training data, which are class_shard, vocab_shard,
word_shard.</p></div>
<div class="section">
<h3><a name="LMDB"></a>LMDB</h3></div>
<div class="section">
Modified: websites/staging/singa/trunk/content/docs/neuralnet-partition.html
==============================================================================
--- websites/staging/singa/trunk/content/docs/neuralnet-partition.html
(original)
+++ websites/staging/singa/trunk/content/docs/neuralnet-partition.html Tue Jul
28 12:16:16 2015
@@ -409,7 +409,7 @@
<p>The purposes of partitioning neural network is to distribute the partitions
onto different working units (e.g., threads or nodes, called workers in this
article) and parallelize the processing. Another reason for partition is to
handle large neural network which cannot be hold in a single node. For
instance, to train models against images with high resolution we need large
neural networks (in terms of training parameters).</p>
<p>Since <i>Layer</i> is the first class citizen in SIGNA, we do the partition
against layers. Specifically, we support partitions at two levels. First, users
can configure the location (i.e., worker ID) of each layer. In this way, users
assign one worker for each layer. Secondly, for one layer, we can partition its
neurons or partition the instances (e.g, images). They are called layer
partition and data partition respectively. We illustrate the two types of
partitions using an simple convolutional neural network.</p>
<p><img src="../images/conv-mnist.png" style="width: 220px" alt="" /></p>
-<p>The above figure shows a convolutional neural network without any
partition. It has 8 layers in total (one rectangular represents one layer). The
first layer is DataLayer (data) which reads data from local disk
files/databases (or HDFS). The second layer is a MnistLayer which parses the
records from MNIST data to get the pixels of a batch of 28 images (each image
is of size 28x28). The LabelLayer (label) parses the records to get the label
of each image in the batch. The ConvolutionalLayer (conv1) transforms the input
image to the shape of 8x27x27. The ReLULayer (relu1) conducts elementwise
transformations. The PoolingLayer (pool1) sub-samples the images. The fc1 layer
is fully connected with pool1 layer. It mulitplies each image with a weight
matrix to generate a 10 dimension hidden feature which is then normalized by a
SoftmaxLossLayer to get the prediction.</p>
+<p>The above figure shows a convolutional neural network without any
partition. It has 8 layers in total (one rectangular represents one layer). The
first layer is DataLayer (data) which reads data from local disk
files/databases (or HDFS). The second layer is a MnistLayer which parses the
records from MNIST data to get the pixels of a batch of 8 images (each image is
of size 28x28). The LabelLayer (label) parses the records to get the label of
each image in the batch. The ConvolutionalLayer (conv1) transforms the input
image to the shape of 8x27x27. The ReLULayer (relu1) conducts elementwise
transformations. The PoolingLayer (pool1) sub-samples the images. The fc1 layer
is fully connected with pool1 layer. It mulitplies each image with a weight
matrix to generate a 10 dimension hidden feature which is then normalized by a
SoftmaxLossLayer to get the prediction.</p>
<p><img src="../images/conv-mnist-datap.png" style="width: 1000px" alt=""
/></p>
<p>The above figure shows the convolutional neural network after partitioning
all layers except the DataLayer and ParserLayers, into 3 partitions using data
partition. The read layers process 4 images of the batch, the black and blue
layers process 2 images respectively. Some helper layers, i.e., SliceLayer,
ConcateLayer, BridgeSrcLayer, BridgeDstLayer and SplitLayer, are added
automatically by our partition algorithm. Layers of the same color resident in
the same worker. There would be data transferring across different workers at
the boundary layers (i.e., BridgeSrcLayer and BridgeDstLayer), e.g., between
s-slice-mnist-conv1 and d-slice-mnist-conv1.</p>
<p><img src="../images/conv-mnist-layerp.png" style="width: 1000px" alt=""
/></p>