neuralnet-partition.html

buildbot Tue, 28 Jul 2015 05:17:29 -0700

Author: buildbot
Date: Tue Jul 28 12:16:16 2015
New Revision: 959888

Log:
Staging update by buildbot for singa


Modified:
    websites/staging/singa/trunk/content/   (props changed)
    websites/staging/singa/trunk/content/docs/data.html
    websites/staging/singa/trunk/content/docs/neuralnet-partition.html

Propchange: websites/staging/singa/trunk/content/
------------------------------------------------------------------------------
--- cms:source-revision (original)
+++ cms:source-revision Tue Jul 28 12:16:16 2015
@@ -1 +1 @@
-1693074
+1693077

Modified: websites/staging/singa/trunk/content/docs/data.html
==============================================================================
--- websites/staging/singa/trunk/content/docs/data.html (original)
+++ websites/staging/singa/trunk/content/docs/data.html Tue Jul 28 12:16:16 2015
@@ -403,9 +403,114 @@
                                   
             <div class="section">
 <h2><a name="Data_Preparation"></a>Data Preparation</h2>
-<p>To submit a training job, users need to convert raw data (e.g., images, 
text documents) into records that can be recognized by SINGA. SINGA uses a 
DataLayer to load these records into memory and uses ParserLayer to parse 
features (e.g., image pixels and labels) from these records. The records could 
be organized and stored using many different ways, e.g., using a light 
database, or a file or HDFS, as long as there is a corresponding DataLayer that 
can load the records.</p>
+<p>To submit a training job, users need to convert raw data (e.g., images, 
text documents) into records that can be recognized by SINGA. SINGA uses a 
DataLayer to load these records into memory and uses ParserLayer to parse 
features (e.g., image pixels and labels) from these records. The records could 
be organized and stored using many different ways, e.g., a file, a light 
database, or HDFS, as long as there is a corresponding DataLayer that can load 
the records.</p>
 <div class="section">
-<h3><a name="DataShard"></a>DataShard</h3></div>
+<h3><a name="DataShard"></a>DataShard</h3>
+<p>To create shard for your own data, users may need to implement or modify 
the following files</p>
+
+<ul>
+  
+<li>common.proto</li>
+  
+<li>create_shard.cc</li>
+  
+<li>Makefile</li>
+</ul>
+<p><b>1. Define record</b></p>
+<p>Record class is inherited from Message class whose format follows Google 
protocol buffers. Please refer to the <a class="externalLink" 
href="https://developers.google.com/protocol-buffers/docs/cpptutorial";>Tutorial</a>.
 </p>
+<p>Your record will be defined in a file, 
SINGAfolder/src/proto/common.proto</p>
+<p>(a) Define the record</p>
+
+<div class="source">
+<div class="source"><pre class="prettyprint">message UserRecord {
+    repeated int userVAR1 = 1; // unique id
+    optional string userVAR2 = 2; // unique id
+    ...
+}
+</pre></div></div>
+<p>(b) Declare user own record in Record</p>
+
+<div class="source">
+<div class="source"><pre class="prettyprint">message Record {
+    optional UserRecord user_record = 1; // unique id
+    ...
+}
+</pre></div></div>
+<p>(c) Compile SINGA</p>
+
+<div class="source">
+<div class="source"><pre class="prettyprint">cd SINGAfolder
+./configure
+make
+</pre></div></div>
+<p><b>2. Create shard</b></p>
+<p>(a) Create a folder for dataset, e.g., we call it 
&#x201c;USERDATAfolder&#x201d;.</p>
+<p>(b) Source files for creating shard will be in 
SINGAfolder/USERDATAfolder/</p>
+
+<ul>
+  
+<li>For example of RNNLM, create_shard.cc is in SINGAfolder/examples/rnnlm</li>
+</ul>
+<p>(c) Create shard</p>
+
+<div class="source">
+<div class="source"><pre class="prettyprint">singa::DataShard myShard( 
outputpath, mode);
+</pre></div></div>
+
+<ul>
+  
+<li><tt>string outputpath</tt>, where user wants to create shard.</li>
+  
+<li><tt>int mode := kRead | kCreate | kAppend</tt>, is defined in 
SINGAfolder/include/utils/data_shard.h</li>
+</ul>
+<p><b>3. Store record into shard</b></p>
+<p>(a) xxx</p>
+
+<div class="source">
+<div class="source"><pre class="prettyprint">singa::Record record;
+singa::UserRecord *myRecord = record.mutable_user_record();
+</pre></div></div>
+<p><tt>mutable_user_record()</tt> method is automatically generated after 
compiling SINGA at Step 1-(c).</p>
+<p>(b) Set/Add values into the record</p>
+
+<div class="source">
+<div class="source"><pre class="prettyprint">myRecord-&gt;add_userVAR1( 
int_val );
+myRecord-&gt;set_userVAR2( string_val );
+</pre></div></div>
+<p>(c) Store the record to shard</p>
+
+<div class="source">
+<div class="source"><pre class="prettyprint">myShard.Insert( key, myRecord );
+</pre></div></div>
+
+<ul>
+  
+<li><tt>String key</tt>, will be a unique id for a message</li>
+</ul>
+<p><b>Example of RNNLM</b></p>
+<p>You can refer to RNNLM example at SINGAfolder/example/rnnlm/</p>
+
+<div class="source">
+<div class="source"><pre class="prettyprint">message SingleWordRecord {
+    optional string word = 1;
+    optional int32 word_index = 2;
+    optional int32 class_index =3;`
+}
+
+message Record {
+    optional SingleWordRecord word_record = 4;
+}
+
+make download
+to download raw data from https://www.rnnlm.org
+</pre></div></div>
+<p>In this example, rnnlm-0.4b is used.</p>
+
+<div class="source">
+<div class="source"><pre class="prettyprint">make create
+</pre></div></div>
+<p>to process input text file, create records, and store it into shard</p>
+<p>We create 3 shards for training data, which are class_shard, vocab_shard, 
word_shard.</p></div>
 <div class="section">
 <h3><a name="LMDB"></a>LMDB</h3></div>
 <div class="section">

Modified: websites/staging/singa/trunk/content/docs/neuralnet-partition.html
==============================================================================
--- websites/staging/singa/trunk/content/docs/neuralnet-partition.html 
(original)
+++ websites/staging/singa/trunk/content/docs/neuralnet-partition.html Tue Jul 
28 12:16:16 2015
@@ -409,7 +409,7 @@
 <p>The purposes of partitioning neural network is to distribute the partitions 
onto different working units (e.g., threads or nodes, called workers in this 
article) and parallelize the processing. Another reason for partition is to 
handle large neural network which cannot be hold in a single node. For 
instance, to train models against images with high resolution we need large 
neural networks (in terms of training parameters).</p>
 <p>Since <i>Layer</i> is the first class citizen in SIGNA, we do the partition 
against layers. Specifically, we support partitions at two levels. First, users 
can configure the location (i.e., worker ID) of each layer. In this way, users 
assign one worker for each layer. Secondly, for one layer, we can partition its 
neurons or partition the instances (e.g, images). They are called layer 
partition and data partition respectively. We illustrate the two types of 
partitions using an simple convolutional neural network.</p>
 <p><img src="../images/conv-mnist.png" style="width: 220px" alt="" /></p>
-<p>The above figure shows a convolutional neural network without any 
partition. It has 8 layers in total (one rectangular represents one layer). The 
first layer is DataLayer (data) which reads data from local disk 
files/databases (or HDFS). The second layer is a MnistLayer which parses the 
records from MNIST data to get the pixels of a batch of 28 images (each image 
is of size 28x28). The LabelLayer (label) parses the records to get the label 
of each image in the batch. The ConvolutionalLayer (conv1) transforms the input 
image to the shape of 8x27x27. The ReLULayer (relu1) conducts elementwise 
transformations. The PoolingLayer (pool1) sub-samples the images. The fc1 layer 
is fully connected with pool1 layer. It mulitplies each image with a weight 
matrix to generate a 10 dimension hidden feature which is then normalized by a 
SoftmaxLossLayer to get the prediction.</p>
+<p>The above figure shows a convolutional neural network without any 
partition. It has 8 layers in total (one rectangular represents one layer). The 
first layer is DataLayer (data) which reads data from local disk 
files/databases (or HDFS). The second layer is a MnistLayer which parses the 
records from MNIST data to get the pixels of a batch of 8 images (each image is 
of size 28x28). The LabelLayer (label) parses the records to get the label of 
each image in the batch. The ConvolutionalLayer (conv1) transforms the input 
image to the shape of 8x27x27. The ReLULayer (relu1) conducts elementwise 
transformations. The PoolingLayer (pool1) sub-samples the images. The fc1 layer 
is fully connected with pool1 layer. It mulitplies each image with a weight 
matrix to generate a 10 dimension hidden feature which is then normalized by a 
SoftmaxLossLayer to get the prediction.</p>
 <p><img src="../images/conv-mnist-datap.png" style="width: 1000px" alt="" 
/></p>
 <p>The above figure shows the convolutional neural network after partitioning 
all layers except the DataLayer and ParserLayers, into 3 partitions using data 
partition. The read layers process 4 images of the batch, the black and blue 
layers process 2 images respectively. Some helper layers, i.e., SliceLayer, 
ConcateLayer, BridgeSrcLayer, BridgeDstLayer and SplitLayer, are added 
automatically by our partition algorithm. Layers of the same color resident in 
the same worker. There would be data transferring across different workers at 
the boundary layers (i.e., BridgeSrcLayer and BridgeDstLayer), e.g., between 
s-slice-mnist-conv1 and d-slice-mnist-conv1.</p>
 <p><img src="../images/conv-mnist-layerp.png" style="width: 1000px" alt="" 
/></p>

svn commit: r959888 - in /websites/staging/singa/trunk/content: ./ docs/data.html docs/neuralnet-partition.html

Reply via email to