svn commit: r964001 - in /websites/staging/singa/trunk/content: ./ community/ develop/ docs/

buildbot Wed, 02 Sep 2015 01:15:51 -0700

Author: buildbot
Date: Wed Sep  2 08:15:15 2015
New Revision: 964001

Log:
Staging update by buildbot for singa


Added:
    websites/staging/singa/trunk/content/docs/data.html
Modified:
    websites/staging/singa/trunk/content/   (props changed)
    websites/staging/singa/trunk/content/community.html
    websites/staging/singa/trunk/content/community/issue-tracking.html
    websites/staging/singa/trunk/content/community/mail-lists.html
    websites/staging/singa/trunk/content/community/source-repository.html
    websites/staging/singa/trunk/content/community/team-list.html
    websites/staging/singa/trunk/content/develop/contribute-code.html
    websites/staging/singa/trunk/content/develop/contribute-docs.html
    websites/staging/singa/trunk/content/develop/how-contribute.html
    websites/staging/singa/trunk/content/develop/schedule.html
    websites/staging/singa/trunk/content/docs.html
    websites/staging/singa/trunk/content/docs/architecture.html
    websites/staging/singa/trunk/content/docs/checkpoint.html
    websites/staging/singa/trunk/content/docs/cnn.html
    websites/staging/singa/trunk/content/docs/code-structure.html
    websites/staging/singa/trunk/content/docs/communication.html

Propchange: websites/staging/singa/trunk/content/
------------------------------------------------------------------------------
--- cms:source-revision (original)
+++ cms:source-revision Wed Sep  2 08:15:15 2015
@@ -1 +1 @@
-1696297
+1700726

Modified: websites/staging/singa/trunk/content/community.html
==============================================================================
--- websites/staging/singa/trunk/content/community.html (original)
+++ websites/staging/singa/trunk/content/community.html Wed Sep  2 08:15:15 2015
@@ -1,13 +1,13 @@
 <!DOCTYPE html>
 <!--
- | Generated by Apache Maven Doxia at 2015-08-17 
+ | Generated by Apache Maven Doxia at 2015-09-02 
  | Rendered using Apache Maven Fluido Skin 1.4
 -->
 <html xmlns="http://www.w3.org/1999/xhtml"; xml:lang="en" lang="en">
   <head>
     <meta charset="UTF-8" />
     <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-    <meta name="Date-Revision-yyyymmdd" content="20150817" />
+    <meta name="Date-Revision-yyyymmdd" content="20150902" />
     <meta http-equiv="Content-Language" content="en" />
     <title>Apache SINGA &#x2013; Community</title>
     <link rel="stylesheet" href="./css/apache-maven-fluido-1.4.min.css" />

Modified: websites/staging/singa/trunk/content/community/issue-tracking.html
==============================================================================
--- websites/staging/singa/trunk/content/community/issue-tracking.html 
(original)
+++ websites/staging/singa/trunk/content/community/issue-tracking.html Wed Sep  
2 08:15:15 2015
@@ -1,13 +1,13 @@
 <!DOCTYPE html>
 <!--
- | Generated by Apache Maven Doxia at 2015-08-17 
+ | Generated by Apache Maven Doxia at 2015-09-02 
  | Rendered using Apache Maven Fluido Skin 1.4
 -->
 <html xmlns="http://www.w3.org/1999/xhtml"; xml:lang="en" lang="en">
   <head>
     <meta charset="UTF-8" />
     <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-    <meta name="Date-Revision-yyyymmdd" content="20150817" />
+    <meta name="Date-Revision-yyyymmdd" content="20150902" />
     <meta http-equiv="Content-Language" content="en" />
     <title>Apache SINGA &#x2013; Issue Tracking</title>
     <link rel="stylesheet" href="../css/apache-maven-fluido-1.4.min.css" />

Modified: websites/staging/singa/trunk/content/community/mail-lists.html
==============================================================================
--- websites/staging/singa/trunk/content/community/mail-lists.html (original)
+++ websites/staging/singa/trunk/content/community/mail-lists.html Wed Sep  2 
08:15:15 2015
@@ -1,13 +1,13 @@
 <!DOCTYPE html>
 <!--
- | Generated by Apache Maven Doxia at 2015-08-17 
+ | Generated by Apache Maven Doxia at 2015-09-02 
  | Rendered using Apache Maven Fluido Skin 1.4
 -->
 <html xmlns="http://www.w3.org/1999/xhtml"; xml:lang="en" lang="en">
   <head>
     <meta charset="UTF-8" />
     <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-    <meta name="Date-Revision-yyyymmdd" content="20150817" />
+    <meta name="Date-Revision-yyyymmdd" content="20150902" />
     <meta http-equiv="Content-Language" content="en" />
     <title>Apache SINGA &#x2013; Project Mailing Lists</title>
     <link rel="stylesheet" href="../css/apache-maven-fluido-1.4.min.css" />

Modified: websites/staging/singa/trunk/content/community/source-repository.html
==============================================================================
--- websites/staging/singa/trunk/content/community/source-repository.html 
(original)
+++ websites/staging/singa/trunk/content/community/source-repository.html Wed 
Sep  2 08:15:15 2015
@@ -1,13 +1,13 @@
 <!DOCTYPE html>
 <!--
- | Generated by Apache Maven Doxia at 2015-08-17 
+ | Generated by Apache Maven Doxia at 2015-09-02 
  | Rendered using Apache Maven Fluido Skin 1.4
 -->
 <html xmlns="http://www.w3.org/1999/xhtml"; xml:lang="en" lang="en">
   <head>
     <meta charset="UTF-8" />
     <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-    <meta name="Date-Revision-yyyymmdd" content="20150817" />
+    <meta name="Date-Revision-yyyymmdd" content="20150902" />
     <meta http-equiv="Content-Language" content="en" />
     <title>Apache SINGA &#x2013; Source Repository</title>
     <link rel="stylesheet" href="../css/apache-maven-fluido-1.4.min.css" />

Modified: websites/staging/singa/trunk/content/community/team-list.html
==============================================================================
--- websites/staging/singa/trunk/content/community/team-list.html (original)
+++ websites/staging/singa/trunk/content/community/team-list.html Wed Sep  2 
08:15:15 2015
@@ -1,13 +1,13 @@
 <!DOCTYPE html>
 <!--
- | Generated by Apache Maven Doxia at 2015-08-17 
+ | Generated by Apache Maven Doxia at 2015-09-02 
  | Rendered using Apache Maven Fluido Skin 1.4
 -->
 <html xmlns="http://www.w3.org/1999/xhtml"; xml:lang="en" lang="en">
   <head>
     <meta charset="UTF-8" />
     <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-    <meta name="Date-Revision-yyyymmdd" content="20150817" />
+    <meta name="Date-Revision-yyyymmdd" content="20150902" />
     <meta http-equiv="Content-Language" content="en" />
     <title>Apache SINGA &#x2013; The SINGA Team</title>
     <link rel="stylesheet" href="../css/apache-maven-fluido-1.4.min.css" />

Modified: websites/staging/singa/trunk/content/develop/contribute-code.html
==============================================================================
--- websites/staging/singa/trunk/content/develop/contribute-code.html (original)
+++ websites/staging/singa/trunk/content/develop/contribute-code.html Wed Sep  
2 08:15:15 2015
@@ -1,13 +1,13 @@
 <!DOCTYPE html>
 <!--
- | Generated by Apache Maven Doxia at 2015-08-17 
+ | Generated by Apache Maven Doxia at 2015-09-02 
  | Rendered using Apache Maven Fluido Skin 1.4
 -->
 <html xmlns="http://www.w3.org/1999/xhtml"; xml:lang="en" lang="en">
   <head>
     <meta charset="UTF-8" />
     <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-    <meta name="Date-Revision-yyyymmdd" content="20150817" />
+    <meta name="Date-Revision-yyyymmdd" content="20150902" />
     <meta http-equiv="Content-Language" content="en" />
     <title>Apache SINGA &#x2013; How to Contribute Code</title>
     <link rel="stylesheet" href="../css/apache-maven-fluido-1.4.min.css" />

Modified: websites/staging/singa/trunk/content/develop/contribute-docs.html
==============================================================================
--- websites/staging/singa/trunk/content/develop/contribute-docs.html (original)
+++ websites/staging/singa/trunk/content/develop/contribute-docs.html Wed Sep  
2 08:15:15 2015
@@ -1,13 +1,13 @@
 <!DOCTYPE html>
 <!--
- | Generated by Apache Maven Doxia at 2015-08-17 
+ | Generated by Apache Maven Doxia at 2015-09-02 
  | Rendered using Apache Maven Fluido Skin 1.4
 -->
 <html xmlns="http://www.w3.org/1999/xhtml"; xml:lang="en" lang="en">
   <head>
     <meta charset="UTF-8" />
     <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-    <meta name="Date-Revision-yyyymmdd" content="20150817" />
+    <meta name="Date-Revision-yyyymmdd" content="20150902" />
     <meta http-equiv="Content-Language" content="en" />
     <title>Apache SINGA &#x2013; How to Contribute Documentation</title>
     <link rel="stylesheet" href="../css/apache-maven-fluido-1.4.min.css" />

Modified: websites/staging/singa/trunk/content/develop/how-contribute.html
==============================================================================
--- websites/staging/singa/trunk/content/develop/how-contribute.html (original)
+++ websites/staging/singa/trunk/content/develop/how-contribute.html Wed Sep  2 
08:15:15 2015
@@ -1,13 +1,13 @@
 <!DOCTYPE html>
 <!--
- | Generated by Apache Maven Doxia at 2015-08-17 
+ | Generated by Apache Maven Doxia at 2015-09-02 
  | Rendered using Apache Maven Fluido Skin 1.4
 -->
 <html xmlns="http://www.w3.org/1999/xhtml"; xml:lang="en" lang="en">
   <head>
     <meta charset="UTF-8" />
     <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-    <meta name="Date-Revision-yyyymmdd" content="20150817" />
+    <meta name="Date-Revision-yyyymmdd" content="20150902" />
     <meta http-equiv="Content-Language" content="en" />
     <title>Apache SINGA &#x2013; How to Contribute to SINGA</title>
     <link rel="stylesheet" href="../css/apache-maven-fluido-1.4.min.css" />

Modified: websites/staging/singa/trunk/content/develop/schedule.html
==============================================================================
--- websites/staging/singa/trunk/content/develop/schedule.html (original)
+++ websites/staging/singa/trunk/content/develop/schedule.html Wed Sep  2 
08:15:15 2015
@@ -1,13 +1,13 @@
 <!DOCTYPE html>
 <!--
- | Generated by Apache Maven Doxia at 2015-08-17 
+ | Generated by Apache Maven Doxia at 2015-09-02 
  | Rendered using Apache Maven Fluido Skin 1.4
 -->
 <html xmlns="http://www.w3.org/1999/xhtml"; xml:lang="en" lang="en">
   <head>
     <meta charset="UTF-8" />
     <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-    <meta name="Date-Revision-yyyymmdd" content="20150817" />
+    <meta name="Date-Revision-yyyymmdd" content="20150902" />
     <meta http-equiv="Content-Language" content="en" />
     <title>Apache SINGA &#x2013; Development Schedule</title>
     <link rel="stylesheet" href="../css/apache-maven-fluido-1.4.min.css" />

Modified: websites/staging/singa/trunk/content/docs.html
==============================================================================
--- websites/staging/singa/trunk/content/docs.html (original)
+++ websites/staging/singa/trunk/content/docs.html Wed Sep  2 08:15:15 2015
@@ -1,13 +1,13 @@
 <!DOCTYPE html>
 <!--
- | Generated by Apache Maven Doxia at 2015-08-17 
+ | Generated by Apache Maven Doxia at 2015-09-02 
  | Rendered using Apache Maven Fluido Skin 1.4
 -->
 <html xmlns="http://www.w3.org/1999/xhtml"; xml:lang="en" lang="en">
   <head>
     <meta charset="UTF-8" />
     <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-    <meta name="Date-Revision-yyyymmdd" content="20150817" />
+    <meta name="Date-Revision-yyyymmdd" content="20150902" />
     <meta http-equiv="Content-Language" content="en" />
     <title>Apache SINGA &#x2013; Documentation</title>
     <link rel="stylesheet" href="./css/apache-maven-fluido-1.4.min.css" />

Modified: websites/staging/singa/trunk/content/docs/architecture.html
==============================================================================
--- websites/staging/singa/trunk/content/docs/architecture.html (original)
+++ websites/staging/singa/trunk/content/docs/architecture.html Wed Sep  2 
08:15:15 2015
@@ -1,15 +1,15 @@
 <!DOCTYPE html>
 <!--
- | Generated by Apache Maven Doxia at 2015-08-17 
+ | Generated by Apache Maven Doxia at 2015-09-02 
  | Rendered using Apache Maven Fluido Skin 1.4
 -->
 <html xmlns="http://www.w3.org/1999/xhtml"; xml:lang="en" lang="en">
   <head>
     <meta charset="UTF-8" />
     <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-    <meta name="Date-Revision-yyyymmdd" content="20150817" />
+    <meta name="Date-Revision-yyyymmdd" content="20150902" />
     <meta http-equiv="Content-Language" content="en" />
-    <title>Apache SINGA &#x2013; System Architecture</title>
+    <title>Apache SINGA &#x2013; </title>
     <link rel="stylesheet" href="../css/apache-maven-fluido-1.4.min.css" />
     <link rel="stylesheet" href="../css/site.css" />
     <link rel="stylesheet" href="../css/print.css" media="print" />
@@ -189,7 +189,7 @@
         Apache SINGA</a>
                     <span class="divider">/</span>
       </li>
-        <li class="active ">System Architecture</li>
+        <li class="active "></li>
         
                 
                     
@@ -423,14 +423,15 @@
                         
         <div id="bodyColumn"  class="span10" >
                                   
-            <div class="section">
-<h2><a name="System_Architecture"></a>System Architecture</h2>
-<hr />
+            <p>&#x2014; layout: post title: Architecture category : docs</p>
+<div class="section">
+<h2><a name="tags_:_architecture"></a>tags : [architecture]</h2>
+<p>{% include JB/setup %}</p>
 <div class="section">
 <h3><a name="Logical_Architecture"></a>Logical Architecture</h3>
-<p><img src="../images/distributed/logical.png" style="width: 550px" alt="" /> 
+<p><img src="http://singa.incubator.apache.org/assets/image/logical.png"; 
style="width: 550px" alt="" /> 
 <p><b> Fig.1 - Logical system architecture</b></p>
-<p>SINGA has flexible architecture to support different distributed <a 
href="frameworks.html">training frameworks</a> (both synchronous and 
asynchronous). The logical system architecture is shown in Fig.1. The 
architecture consists of multiple server groups and worker groups:</p>
+<p>SINGA has flexible architecture to support different distributed <a 
class="externalLink" 
href="http://singa.incubator.apache.org/docs/frameworks.html";>training 
frameworks</a> (both synchronous and asynchronous). The logical system 
architecture is shown in Fig.1. The architecture consists of multiple server 
groups and worker groups:</p>
 
 <ul>
   
@@ -438,7 +439,7 @@
   
 <li><b>Worker group</b>  Each worker group communicates with only one server 
group.  A worker group trains a complete model replica  against a partition of 
the training dataset,  and is responsible for computing parameter gradients.  
All worker groups run and communicate with the corresponding  server groups 
asynchronously.  However, inside each worker group,  the workers synchronously 
compute parameter updates for the model replica.</li>
 </ul>
-<p>There are different strategies to distribute the training workload among 
workers within a group: </p>
+<p>There are different strategies to distribute the training workload among 
workers within a group:</p>
 
 <ul>
   
@@ -450,7 +451,7 @@
 </ul></div>
 <div class="section">
 <h3><a name="Implementation"></a>Implementation</h3>
-<p>In SINGA, servers and workers are execution units running in separate 
threads. They communicate through <a href="communication.html">messages</a>. 
Every process runs the main thread as a stub that aggregates local messages and 
forwards them to corresponding (remote) receivers.</p>
+<p>In SINGA, servers and workers are execution units running in separate 
threads. They communicate through <a class="externalLink" 
href="http://singa.incubator.apache.org/docs/communication.html";>messages</a>. 
Every process runs the main thread as a stub that aggregates local messages and 
forwards them to corresponding (remote) receivers.</p>
 <p>Each server group and worker group have a <i>ParamShard</i> object 
representing a complete model replica. If workers and servers resident in the 
same process, their <i>ParamShard</i> (partitions) can be configured to share 
the same memory space. In this case, the messages transferred between different 
execution units just contain pointers to the data, which reduces the 
communication cost. Unlike in inter-process cases, the messages have to include 
the parameter values.</p></div></div>
                   </div>
             </div>

Modified: websites/staging/singa/trunk/content/docs/checkpoint.html
==============================================================================
--- websites/staging/singa/trunk/content/docs/checkpoint.html (original)
+++ websites/staging/singa/trunk/content/docs/checkpoint.html Wed Sep  2 
08:15:15 2015
@@ -1,15 +1,15 @@
 <!DOCTYPE html>
 <!--
- | Generated by Apache Maven Doxia at 2015-08-17 
+ | Generated by Apache Maven Doxia at 2015-09-02 
  | Rendered using Apache Maven Fluido Skin 1.4
 -->
 <html xmlns="http://www.w3.org/1999/xhtml"; xml:lang="en" lang="en">
   <head>
     <meta charset="UTF-8" />
     <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-    <meta name="Date-Revision-yyyymmdd" content="20150817" />
+    <meta name="Date-Revision-yyyymmdd" content="20150902" />
     <meta http-equiv="Content-Language" content="en" />
-    <title>Apache SINGA &#x2013; Checkpoint and Resume</title>
+    <title>Apache SINGA &#x2013; </title>
     <link rel="stylesheet" href="../css/apache-maven-fluido-1.4.min.css" />
     <link rel="stylesheet" href="../css/site.css" />
     <link rel="stylesheet" href="../css/print.css" media="print" />
@@ -189,7 +189,7 @@
         Apache SINGA</a>
                     <span class="divider">/</span>
       </li>
-        <li class="active ">Checkpoint and Resume</li>
+        <li class="active "></li>
         
                 
                     
@@ -423,80 +423,84 @@
                         
         <div id="bodyColumn"  class="span10" >
                                   
-            <div class="section">
-<h2><a name="Checkpoint_and_Resume"></a>Checkpoint and Resume</h2>
-<hr />
+            <p>&#x2014; layout: post title: Checkpoint and Resume category : 
docs</p>
 <div class="section">
-<h3><a name="Applications_of_checkpoint"></a>Applications of checkpoint</h3>
-<p>By taking checkpoints of model parameters, we can</p>
+<h2><a name="tags_:_checkpoint_restore"></a>tags : [checkpoint, restore]</h2>
+<p>{% include JB/setup %}</p>
+<p>SINGA checkpoints model parameters onto disk periodically according to user 
configured frequency. By checkpointing model parameters, we can</p>
 
 <ol style="list-style-type: decimal">
   
 <li>
-<p>Restore (resume) the training from the last checkpoint. For example, if the 
program crashes before finishing all training steps.</p></li>
+<p>resume the training from the last checkpointing. For example, if the 
program crashes before finishing all training steps, we can continue the 
training using checkpoint files.</p></li>
   
 <li>
-<p>Use them as pre-training results for a similar model. For example, the 
parameters from training a RBM model can be used to initialize a <a 
href="auto-encoder.html">deep auto-encoder</a> model.</p></li>
+<p>use them to initialize a similar model. For example, the parameters from 
training a RBM model can be used to initialize a <a class="externalLink" 
href="http://singa.incubator.apache.org/docs/rbm";>deep auto-encoder</a> 
model.</p></li>
 </ol></div>
 <div class="section">
-<h3><a name="Instructions_for_checkpoint_and_resume"></a>Instructions for 
checkpoint and resume</h3>
-<p>Checkpoint is controlled by two model configuration fields: 
<tt>checkpoint_after</tt> (start checkpoint after this number of training 
steps) and <tt>checkpoint_frequency</tt>. The checkpoint files are located at 
<tt>WORKSPACE/checkpoint/stepSTEP-workerWORKERID.bin</tt>.</p>
-<p>The following configuration shows an example,</p>
+<h2><a name="Configuration"></a>Configuration</h2>
+<p>Checkpointing is controlled by two configuration fields:</p>
+
+<ul>
+  
+<li><tt>checkpoint_after</tt>, start checkpointing after this number of 
training steps,</li>
+  
+<li><tt>checkpoint_freq</tt>, frequency of doing checkpointing.</li>
+</ul>
+<p>For example,</p>
 
 <div class="source">
-<div class="source"><pre class="prettyprint">model {
-  ...
-  checkpoint_after: 100
-  checkpoint_frequency: 300
-  ...
-}
+<div class="source"><pre class="prettyprint"># job.conf
+workspace: &quot;WORKSPACE&quot;
+checkpoint_after: 100
+checkpoint_frequency: 300
+...
 </pre></div></div>
-<p>After training for 700 steps, under WORKSPACE/checkpoint folder, there 
would be two checkpoint files (training on single node):</p>
+<p>Checkpointing files are located at 
<i>WORKSPACE/checkpoint/stepSTEP-workerWORKERID.bin</i>. For the above 
configuration, after training for 700 steps, there would be two checkpointing 
files,</p>
 
 <div class="source">
 <div class="source"><pre class="prettyprint">step400-worker0.bin
 step700-worker0.bin
-</pre></div></div>
+</pre></div></div></div>
 <div class="section">
-<h4><a name="Application_1"></a>Application 1</h4>
-<p>We can resume the training from the last checkpoint (i.e., step 700) by:</p>
+<h2><a name="Application_-_resuming_training"></a>Application - resuming 
training</h2>
+<p>We can resume the training from the last checkpoint (i.e., step 700) by,</p>
 
 <div class="source">
-<div class="source"><pre class="prettyprint">./bin/singa-run.sh 
-workspace=WORKSPACE --resume
-</pre></div></div></div>
+<div class="source"><pre class="prettyprint">./bin/singa-run.sh -conf JOB_CONF 
-resume
+</pre></div></div>
+<p>There is no change to the job configuration.</p></div>
 <div class="section">
-<h4><a name="Application_2"></a>Application 2</h4>
-<p>We can also use the checkpoint file from step 400 as the pre-trained model 
for a new model by configuring the job.conf of the new model as:</p>
+<h2><a name="Application_-_model_initialization"></a>Application - model 
initialization</h2>
+<p>We can also use the checkpointing file from step 400 to initialize a new 
model by configuring the new job as,</p>
 
 <div class="source">
-<div class="source"><pre class="prettyprint">model {
-  ...
-  checkpoint : &quot;WORKSPACE/checkpoint/step400-worker0.bin&quot;
-  ...
-}
+<div class="source"><pre class="prettyprint"># job.conf
+checkpoint : &quot;WORKSPACE/checkpoint/step400-worker0.bin&quot;
+...
 </pre></div></div>
-<p>If there are multiple checkpoint files for the same snapshot due to model 
partitioning, all the checkpoint files should be added:</p>
+<p>If there are multiple checkpointing files for the same snapshot due to 
model partitioning, all the checkpointing files should be added,</p>
 
 <div class="source">
-<div class="source"><pre class="prettyprint">model {
-  ...
-  checkpoint : &quot;WORKSPACE/checkpoint/step400-worker0.bin&quot;
-  checkpoint : &quot;WORKSPACE/checkpoint/step400-worker1.bin&quot;
-  ...
-}
+<div class="source"><pre class="prettyprint"># job.conf
+checkpoint : &quot;WORKSPACE/checkpoint/step400-worker0.bin&quot;
+checkpoint : &quot;WORKSPACE/checkpoint/step400-worker1.bin&quot;
+...
 </pre></div></div>
-<p>The launching command is the same as starting a new job</p>
+<p>The training command is the same as starting a new job,</p>
 
 <div class="source">
-<div class="source"><pre class="prettyprint">./bin/singa-run.sh 
-workspace=WORKSPACE
-</pre></div></div></div></div>
+<div class="source"><pre class="prettyprint">./bin/singa-run.sh -conf JOB_CONF
+</pre></div></div>
+<p>{% comment %}</p></div>
 <div class="section">
-<h3><a name="Implementation_details"></a>Implementation details</h3>
-<p>The checkpoint is done in the Worker class and controlled by two model 
configuration fields: <tt>checkpoint_after</tt> and 
<tt>checkpoint_frequency</tt>. Only Params owning the param values from the 
first group are dumped onto into checkpoint files. For one Param object, its 
name, version and values are saved. It is possible that the snapshot is 
separated into multiple files because the neural net is partitioned into 
multiple workers.</p>
-<p>The Worker&#x2019;s InitLocalParam will initialize Params from checkpoint 
files if the <tt>checkpoint</tt> field is set. Otherwise it randomly initialize 
them using user configured initialization method. The Param objects are matched 
based on name. If the Param is not configured with a name, NeuralNet class will 
automatically create one for it based on the name of the layer to which the 
Param object belongs. The <tt>checkpoint</tt> can be set by users (Application 
1) or by the Resume function (Application 2) of the Trainer class, which finds 
the files for the latest snapshot and add them to the <tt>checkpoint</tt> 
filed. It also sets the <tt>step</tt> field of model configuration to the 
checkpoint step (extracted from file name).</p></div>
+<h2><a name="Advanced_user_guide"></a>Advanced user guide</h2>
+<p>Checkpointing is done in the <a class="externalLink" 
href="http://singa.incubator.apache.org/api/classsinga_1_1Worker.html";>Worker 
class</a>. Only <tt>Param</tt>s from the first group are dumped into 
checkpointing files. For a <tt>Param</tt> object, its name, version and values 
are saved. It is possible that the snapshot is separated into multiple files 
because the neural net is partitioned into multiple workers.</p>
+<p>The Worker&#x2019;s <tt>InitLocalParam</tt> function will initialize 
parameters from checkpointing files if the <tt>checkpoint</tt> field is set. 
Otherwise it randomly initialize them using user configured initialization 
method. The <tt>Param</tt> objects are matched based on name. If a 
<tt>Param</tt> object is not configured with a name, <tt>NeuralNet</tt> class 
will automatically create one for it based on the name of the layer. The 
<tt>checkpoint</tt> can be set by users (Application 1) or by the 
<tt>Resume</tt> function (Application 2) of the Trainer class, which finds the 
files for the latest snapshot and add them to the <tt>checkpoint</tt> filed. It 
also sets the <tt>step</tt> field of model configuration to the checkpoint step 
(extracted from file name).</p>
 <div class="section">
 <h3><a name="Caution"></a>Caution</h3>
-<p>Both two applications must be taken carefully when Param objects are 
partitioned due to model partitioning. Because if the training is done using 2 
workers, while the new model (or continue training) is trained with 3 workers, 
then the same original Param object is partitioned in different ways and hence 
cannot be matched.</p></div></div>
+<p>Both two applications must be taken carefully when <tt>Param</tt> objects 
are partitioned due to model partitioning. Because if the training is done 
using 2 workers, while the new model (or continue training) is trained with 3 
workers, then the same original <tt>Param</tt> object is partitioned in 
different ways and hence cannot be matched.</p>
+<p>{% endcomment %}</p></div></div>
                   </div>
             </div>
           </div>

Modified: websites/staging/singa/trunk/content/docs/cnn.html
==============================================================================
--- websites/staging/singa/trunk/content/docs/cnn.html (original)
+++ websites/staging/singa/trunk/content/docs/cnn.html Wed Sep  2 08:15:15 2015
@@ -1,13 +1,13 @@
 <!DOCTYPE html>
 <!--
- | Generated by Apache Maven Doxia at 2015-08-17 
+ | Generated by Apache Maven Doxia at 2015-09-02 
  | Rendered using Apache Maven Fluido Skin 1.4
 -->
 <html xmlns="http://www.w3.org/1999/xhtml"; xml:lang="en" lang="en">
   <head>
     <meta charset="UTF-8" />
     <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-    <meta name="Date-Revision-yyyymmdd" content="20150817" />
+    <meta name="Date-Revision-yyyymmdd" content="20150902" />
     <meta http-equiv="Content-Language" content="en" />
     <title>Apache SINGA &#x2013; </title>
     <link rel="stylesheet" href="../css/apache-maven-fluido-1.4.min.css" />
@@ -21,7 +21,7 @@
     <script type="text/javascript" 
src="../js/apache-maven-fluido-1.4.min.js"></script>
 
     
-    <meta name="Notice" content="Licensed to the Apache Software Foundation 
(ASF) under one            or more contributor license agreements.  See the 
NOTICE file            distributed with this work for additional information    
        regarding copyright ownership.  The ASF licenses this file            
to you under the Apache License, Version 2.0 (the            
&quot;License&quot;); you may not use this file except in compliance            
with the License.  You may obtain a copy of the License at            .         
     http://www.apache.org/licenses/LICENSE-2.0            .            Unless 
required by applicable law or agreed to in writing,            software 
distributed under the License is distributed on an            &quot;AS IS&quot; 
BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY            KIND, either express 
or implied.  See the License for the            specific language governing 
permissions and limitations            under the License." />              </hea
 d>
+                  </head>
         <body class="topBarEnabled">
           
     
@@ -425,62 +425,258 @@
                         
         <div id="bodyColumn"  class="span10" >
                                   
-            
-<p>This example will show you how to use SINGA to train a CNN model using 
cifar10 dataset.</p>
+            <p>&#x2014; layout: post title: Example &#x2014; Convolution 
Neural Network category : docs</p>
 <div class="section">
+<h2><a name="tags_:_cnn_example"></a>tags : [cnn, example]</h2>
+<p>{% include JB/setup %}</p>
+<p>Convolutional neural network (CNN) is a type of feed-forward artificial 
neural network widely used for image and video classification. In this example, 
we will use a deep CNN model to do image classification for the <a 
class="externalLink" href="http://www.cs.toronto.edu/~kriz/cifar.html";>CIFAR10 
dataset</a>.</p></div>
 <div class="section">
-<h3><a name="Prepare_for_the_data"></a>Prepare for the data</h3>
+<h2><a name="Running_instructions"></a>Running instructions</h2>
+<p>Please refer to the <a class="externalLink" 
href="http://singa.incubator.apache.org/docs/installation";>installation</a> 
page for instructions on building SINGA, and the <a class="externalLink" 
href="http://singa.incubator.apache.org/docs/quick-start";>quick start</a> for 
instructions on starting zookeeper.</p>
+<p>We have provided scripts for preparing the training and test dataset in 
<i>examples/cifar10/</i>.</p>
 
-<ul>
-  
-<li>First go to the <tt>example/cifar10/</tt> folder for preparing the 
dataset. There should be a makefile example called Makefile.example in the 
folder. Run the command <tt>cp Makefile.example Makefile</tt> to generate the 
makefile. Then run the command <tt>make download</tt> and <tt>make create</tt> 
in the current folder to download cifar10 dataset and prepare for the training 
and testing datashard.</li>
-</ul></div>
+<div class="source">
+<div class="source"><pre class="prettyprint"># in examples/cifar10
+$ cp Makefile.example Makefile
+$ make download
+$ make create
+</pre></div></div>
+<p>After the datasets are prepared, we start the training by</p>
+
+<div class="source">
+<div class="source"><pre class="prettyprint">./bin/singa-run.sh -conf 
examples/cifar10/job.conf
+</pre></div></div>
+<p>After it is started, you should see output like</p>
+
+<div class="source">
+<div class="source"><pre class="prettyprint">Record job information to 
/tmp/singa-log/job-info/job-2-20150817-055601
+Executing : ./singa -conf /xxx/incubator-singa/examples/cifar10/job.conf 
-singa_conf /xxx/incubator-singa/conf/singa.conf -singa_job 2
+E0817 06:56:18.868259 33849 cluster.cc:51] proc #0 -&gt; 192.168.5.128:49152 
(pid = 33849)
+E0817 06:56:18.928452 33871 server.cc:36] Server (group = 0, id = 0) start
+E0817 06:56:18.928469 33872 worker.cc:134] Worker (group = 0, id = 0) start
+E0817 06:57:13.657302 33849 trainer.cc:373] Test step-0, loss : 2.302588, 
accuracy : 0.077900
+E0817 06:57:17.626708 33849 trainer.cc:373] Train step-0, loss : 2.302578, 
accuracy : 0.062500
+E0817 06:57:24.142645 33849 trainer.cc:373] Train step-30, loss : 2.302404, 
accuracy : 0.131250
+E0817 06:57:30.813354 33849 trainer.cc:373] Train step-60, loss : 2.302248, 
accuracy : 0.156250
+E0817 06:57:37.556655 33849 trainer.cc:373] Train step-90, loss : 2.301849, 
accuracy : 0.175000
+E0817 06:57:44.971276 33849 trainer.cc:373] Train step-120, loss : 2.301077, 
accuracy : 0.137500
+E0817 06:57:51.801949 33849 trainer.cc:373] Train step-150, loss : 2.300410, 
accuracy : 0.135417
+E0817 06:57:58.682281 33849 trainer.cc:373] Train step-180, loss : 2.300067, 
accuracy : 0.127083
+E0817 06:58:05.578366 33849 trainer.cc:373] Train step-210, loss : 2.300143, 
accuracy : 0.154167
+E0817 06:58:12.518497 33849 trainer.cc:373] Train step-240, loss : 2.295912, 
accuracy : 0.185417
+</pre></div></div>
+<p>After the training of some steps (depends on the setting) or the job is 
finished, SINGA will <a class="externalLink" 
href="http://singa.incubator.apache.org/docs/checkpoint";>checkpoint</a> the 
model parameters.</p></div>
+<div class="section">
+<h2><a name="Details"></a>Details</h2>
+<p>To train a model in SINGA, you need to prepare the datasets, and a job 
configuration which specifies the neural net structure, training algorithm (BP 
or CD), SGD update algorithm (e.g. Adagrad), number of training/test steps, 
etc.</p>
+<div class="section">
+<h3><a name="Data_preparation"></a>Data preparation</h3>
+<p>Before using SINGA, you need to write a program to pre-process the dataset 
you use to a format that SINGA can read. Please refer to the <a 
class="externalLink" 
href="http://singa.incubator.apache.org/docs/data#example---cifar-dataset";>Data 
Preparation</a> to get details about preparing this CIFAR10 dataset.</p></div>
 <div class="section">
-<h3><a name="Set_job_configuration."></a>Set job configuration.</h3>
+<h3><a name="Neural_net"></a>Neural net</h3>
+<p>Figure 1 shows the net structure of the CNN model we used in this example, 
which is set following <a class="externalLink" 
href="https://code.google.com/p/cuda-convnet/source/browse/trunk/example-layers/layers-18pct.cfg.";>this
 page</a> The dashed circle represents one feature transformation stage, which 
generally has four layers as shown in the figure. Sometimes the rectifier layer 
and normalization layer is omitted or swapped in one stage. For this example, 
there are 3 such stages.</p>
+<p>Next we follow the guide in <a class="externalLink" 
href="http://singa.incubator.apache.org/docs/neural-net";>neural net page</a> 
and <a class="externalLink" 
href="http://singa.incubator.apache.org/docs/layer";>layer page</a> to write the 
neural net configuration.</p>
+
+<div style="text-align: center">
+<img src="http://singa.incubator.apache.org/assets/image/cnn-example.png"; 
style="width: 200px" alt="" /> <br />
+<b>Figure 1 - Net structure of the CNN example.</b></img>
+</div>
 
 <ul>
   
-<li>If you just want to use the training model provided in this example, you 
can just use job.conf file in current directory. Fig. 1 gives an example of CNN 
struture. In this example, we define a CNN model that contains 3 
convolution+relu+maxpooling+normalization layers. If you want to learn more 
about how it is configured, you can go to <a class="externalLink" 
href="http://singa.incubator.apache.org/docs/model-config.html";>Model 
Configuration</a> to get details.</li>
+<li>
+<p>We configure a <a class="externalLink" 
href="http://singa.incubator.apache.org/docs/layer#data-layers";>data layer</a> 
to read the training/testing <tt>Records</tt> from <tt>DataShard</tt>.</p>
+  
+<div class="source">
+<div class="source"><pre class="prettyprint">layer{
+    name: &quot;data&quot;
+    type: kShardData
+    sharddata_conf {
+      path: &quot;examples/cifar10/cifar10_train_shard&quot;
+      batchsize: 16
+      random_skip: 5000
+    }
+    exclude: kTest  # exclude this layer for the testing net
+  }
+layer{
+    name: &quot;data&quot;
+    type: kShardData
+    sharddata_conf {
+      path: &quot;examples/cifar10/cifar10_test_shard&quot;
+      batchsize: 100
+    }
+    exclude: kTrain # exclude this layer for the training net
+  }
+</pre></div></div></li>
+  
+<li>
+<p>We configure two <a class="externalLink" 
href="http://singa.incubator.apache.org/docs/layer#parser-layers";>parser 
layers</a> to extract the image feature and label from <tt>Records</tt>s loaded 
by the <i>data</i> layer.</p>
+  
+<div class="source">
+<div class="source"><pre class="prettyprint">layer{
+    name:&quot;rgb&quot;
+    type: kRGBImage
+    srclayers: &quot;data&quot;
+    rgbimage_conf {
+      meanfile: &quot;examples/cifar10/image_mean.bin&quot; # normalize image 
feature
+    }
+  }
+layer{
+    name: &quot;label&quot;
+    type: kLabel
+    srclayers: &quot;data&quot;
+  }
+</pre></div></div></li>
 </ul>
 
-<div style="text-align: center">
-<img src="../images/dcnn-cifar10.png" style="width: 280px" alt="" /> <br 
/>Fig. 1: CNN example </img>
-</div></div>
-<div class="section">
-<h3><a name="Run_SINGA"></a>Run SINGA</h3>
-
 <ul>
   
-<li>All script of SINGA should be run in the root folder of SINGA. First you 
need to start the zookeeper service if zookeeper is not started. The command is 
<tt>./bin/zk-service start</tt>. Then you can run the command 
<tt>./bin/singa-run.sh -conf examples/cifar10/job.conf</tt> to start a SINGA 
job using examples/cifar10/job.conf as the job configuration. After it is 
started, you should get a screenshots like the following:</li>
+<li>
+<p>We configure layers for the feature transformation as follows (all layers 
are built-in layers in SINGA; hyper-parameters of these layers are set 
according to <a class="externalLink" 
href="https://code.google.com/p/cuda-convnet/source/browse/trunk/example-layers/layers-18pct.cfg";>Alex&#x2019;s
 setting</a>).</p>
+  
+<div class="source">
+<div class="source"><pre class="prettyprint">layer {
+    name: &quot;conv1&quot;
+    type: kConvolution
+    srclayers: &quot;rgb&quot;
+    convolution_conf {
+      num_filters: 32
+      kernel: 5
+      stride: 1
+      pad:2
+    }
+    param {
+      name: &quot;w1&quot;
+      init {
+        type:kGaussian
+        std:0.0001
+      }
+    }
+    param {
+      name: &quot;b1&quot;
+      lr_scale:2.0
+      init {
+        type: kConstant
+        value:0
+      }
+    }
+  }
+
+  layer {
+    name: &quot;pool1&quot;
+    type: kPooling
+    srclayers: &quot;conv1&quot;
+    pooling_conf {
+      pool: MAX
+      kernel: 3
+      stride: 2
+    }
+  }
+  layer {
+    name: &quot;relu1&quot;
+    type: kReLU
+    srclayers:&quot;pool1&quot;
+  }
+  layer {
+    name: &quot;norm1&quot;
+    type: kLRN
+    lrn_conf {
+      local_size: 3
+      alpha: 5e-05
+      beta: 0.75
+    }
+    srclayers:&quot;relu1&quot;
+  }
+</pre></div></div></li>
 </ul>
+<p>The configurations for another 2 stages are omitted here.</p>
 
+<ul>
+  
+<li>
+<p>There is a <a class="externalLink" 
href="http://singa.incubator.apache.org/docs/layer#innerproductlayer";>inner 
product layer</a> after the 3 transformation stages, which is configured with 
10 output units, i.e., the number of total labels. The weight matrix param is 
configured with a large weight decay scale to reduce the over-fitting.</p>
+  
 <div class="source">
-<div class="source"><pre class="prettyprint">    xxx@yyy:zzz/incubator-singa$ 
./bin/singa-run.sh -conf examples/cifar10/job.conf
-    Unique JOB_ID is 2
-    Record job information to /tmp/singa-log/job-info/job-2-20150817-055601
-    Executing : ./singa -conf /xxx/incubator-singa/examples/cifar10/job.conf 
-singa_conf /xxx/incubator-singa/conf/singa.conf -singa_job 2
-    E0817 06:56:18.868259 33849 cluster.cc:51] proc #0 -&gt; 
192.168.5.128:49152 (pid = 33849)
-    E0817 06:56:18.928452 33871 server.cc:36] Server (group = 0, id = 0) start
-    E0817 06:56:18.928469 33872 worker.cc:134] Worker (group = 0, id = 0) start
-    E0817 06:57:13.657302 33849 trainer.cc:373] Test step-0, loss : 2.302588, 
accuracy : 0.077900
-    E0817 06:57:17.626708 33849 trainer.cc:373] Train step-0, loss : 2.302578, 
accuracy : 0.062500
-    E0817 06:57:24.142645 33849 trainer.cc:373] Train step-30, loss : 
2.302404, accuracy : 0.131250
-    E0817 06:57:30.813354 33849 trainer.cc:373] Train step-60, loss : 
2.302248, accuracy : 0.156250
-    E0817 06:57:37.556655 33849 trainer.cc:373] Train step-90, loss : 
2.301849, accuracy : 0.175000
-    E0817 06:57:44.971276 33849 trainer.cc:373] Train step-120, loss : 
2.301077, accuracy : 0.137500
-    E0817 06:57:51.801949 33849 trainer.cc:373] Train step-150, loss : 
2.300410, accuracy : 0.135417
-    E0817 06:57:58.682281 33849 trainer.cc:373] Train step-180, loss : 
2.300067, accuracy : 0.127083
-    E0817 06:58:05.578366 33849 trainer.cc:373] Train step-210, loss : 
2.300143, accuracy : 0.154167
-    E0817 06:58:12.518497 33849 trainer.cc:373] Train step-240, loss : 
2.295912, accuracy : 0.185417
-</pre></div></div>
-<p>After the training of some steps (depends on the setting) or the job is 
finished, SINGA will checkpoint the current parameter. In the next time, you 
can train (or use for your application) by loading the checkpoint. Please refer 
to <a class="externalLink" 
href="http://singa.incubator.apache.org/docs/checkpoint.html";>Checkpoint</a> 
for the use of checkpoint.</p></div>
+<div class="source"><pre class="prettyprint">layer {
+    name: &quot;ip1&quot;
+    type: kInnerProduct
+    srclayers:&quot;pool3&quot;
+    innerproduct_conf {
+      num_output: 10
+    }
+    param {
+      name: &quot;w4&quot;
+      wd_scale:250
+      init {
+        type:kGaussian
+        std:0.01
+      }
+    }
+    param {
+      name: &quot;b4&quot;
+      lr_scale:2.0
+      wd_scale:0
+      init {
+        type: kConstant
+        value:0
+      }
+    }
+  }
+</pre></div></div></li>
+  
+<li>
+<p>The last layer is a <a class="externalLink" 
href="http://singa.incubator.apache.org/docs/layer#softmaxloss";>Softmax loss 
layer</a></p>
+  
+<div class="source">
+<div class="source"><pre class="prettyprint">  layer{
+    name: &quot;loss&quot;
+    type: kSoftmaxLoss
+    softmaxloss_conf{
+      topk:1
+    }
+    srclayers:&quot;ip1&quot;
+    srclayers: &quot;label&quot;
+  }
+</pre></div></div></li>
+</ul></div>
 <div class="section">
-<h3><a name="Build_your_own_model"></a>Build your own model</h3>
+<h3><a name="Updater"></a>Updater</h3>
+<p>The <a class="externalLink" 
href="http://singa.incubator.apache.org/docs/updater#updater";>normal SGD 
updater</a> is selected. The learning rate is changed like stairs, and is 
configured using the <a class="externalLink" 
href="http://singa.incubator.apache.org/docs/updater#kfixedstep";>kFixedStep</a> 
type.</p>
 
-<ul>
-  
-<li>If you want to specify you own model, then you need to decribe it in the 
job.conf file. It should contain the neurualnet structure, training 
algorithm(backforward or contrastive divergence etc.), SGD update 
algorithm(e.g. Adagrad), number of training/test steps and training/test 
frequency, and display features and etc. SINGA will read job.conf as a Google 
protobuf class <a href="../src/proto/job.proto">JobProto</a>. You can also 
refer to the <a class="externalLink" 
href="http://singa.incubator.apache.org/docs/programmer-guide.html";>Programmer 
Guide</a> to get details.</li>
-</ul></div></div>
+<div class="source">
+<div class="source"><pre class="prettyprint">updater{
+  type: kSGD
+  weight_decay:0.004
+  learning_rate {
+    type: kFixedStep
+    fixedstep_conf:{
+      step:0             # lr for step 0-60000 is 0.001
+      step:60000         # lr for step 60000-65000 is 0.0001
+      step:65000         # lr for step 650000- is 0.00001
+      step_lr:0.001
+      step_lr:0.0001
+      step_lr:0.00001
+    }
+  }
+}
+</pre></div></div></div>
+<div class="section">
+<h3><a name="TrainOneBatch_algorithm"></a>TrainOneBatch algorithm</h3>
+<p>The CNN model is a feed forward model, thus should be configured to use the 
[Back-propagation algorithm]({{ 
BASE_PATH}}/docs/train-one-batch#back-propagation).</p>
+
+<div class="source">
+<div class="source"><pre class="prettyprint">alg: kBP
+</pre></div></div></div>
+<div class="section">
+<h3><a name="Cluster_setting"></a>Cluster setting</h3>
+<p>The following configuration set a single worker and server for training. <a 
class="externalLink" 
href="http://singa.incubator.apache.org/docs/frameworks";>Training 
frameworks</a> page introduces configurations of a couple of distributed 
training frameworks.</p>
+
+<div class="source">
+<div class="source"><pre class="prettyprint">cluster {
+  nworker_groups: 1
+  nserver_groups: 1
+}
+</pre></div></div></div></div>
                   </div>
             </div>
           </div>

Modified: websites/staging/singa/trunk/content/docs/code-structure.html
==============================================================================
--- websites/staging/singa/trunk/content/docs/code-structure.html (original)
+++ websites/staging/singa/trunk/content/docs/code-structure.html Wed Sep  2 
08:15:15 2015
@@ -1,13 +1,13 @@
 <!DOCTYPE html>
 <!--
- | Generated by Apache Maven Doxia at 2015-08-17 
+ | Generated by Apache Maven Doxia at 2015-09-02 
  | Rendered using Apache Maven Fluido Skin 1.4
 -->
 <html xmlns="http://www.w3.org/1999/xhtml"; xml:lang="en" lang="en">
   <head>
     <meta charset="UTF-8" />
     <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-    <meta name="Date-Revision-yyyymmdd" content="20150817" />
+    <meta name="Date-Revision-yyyymmdd" content="20150902" />
     <meta http-equiv="Content-Language" content="en" />
     <title>Apache SINGA &#x2013; Code Structure</title>
     <link rel="stylesheet" href="../css/apache-maven-fluido-1.4.min.css" />

Modified: websites/staging/singa/trunk/content/docs/communication.html
==============================================================================
--- websites/staging/singa/trunk/content/docs/communication.html (original)
+++ websites/staging/singa/trunk/content/docs/communication.html Wed Sep  2 
08:15:15 2015
@@ -1,15 +1,15 @@
 <!DOCTYPE html>
 <!--
- | Generated by Apache Maven Doxia at 2015-08-17 
+ | Generated by Apache Maven Doxia at 2015-09-02 
  | Rendered using Apache Maven Fluido Skin 1.4
 -->
 <html xmlns="http://www.w3.org/1999/xhtml"; xml:lang="en" lang="en">
   <head>
     <meta charset="UTF-8" />
     <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-    <meta name="Date-Revision-yyyymmdd" content="20150817" />
+    <meta name="Date-Revision-yyyymmdd" content="20150902" />
     <meta http-equiv="Content-Language" content="en" />
-    <title>Apache SINGA &#x2013; Communication</title>
+    <title>Apache SINGA &#x2013; </title>
     <link rel="stylesheet" href="../css/apache-maven-fluido-1.4.min.css" />
     <link rel="stylesheet" href="../css/site.css" />
     <link rel="stylesheet" href="../css/print.css" media="print" />
@@ -189,7 +189,7 @@
         Apache SINGA</a>
                     <span class="divider">/</span>
       </li>
-        <li class="active ">Communication</li>
+        <li class="active "></li>
         
                 
                     
@@ -423,14 +423,15 @@
                         
         <div id="bodyColumn"  class="span10" >
                                   
-            <div class="section">
-<h2><a name="Communication"></a>Communication</h2>
-<hr />
+            <p>&#x2014; layout: post title: Communication category : docs</p>
+<div class="section">
+<h2><a name="tags_:_rnn_example"></a>tags : [rnn, example]</h2>
+<p>{% include JB/setup %}</p>
 <p>Different messaging libraries has different benefits and drawbacks. For 
instance, MPI provides fast message passing between GPUs (using GPUDirect), but 
does not support fault-tolerance well. On the contrary, systems using ZeroMQ 
can be fault-tolerant, but does not support GPUDirect. The AllReduce function 
of MPI is also missing in ZeroMQ which is efficient for data aggregation for 
distributed training. In Singa, we provide general messaging APIs for 
communication between threads within a process and across processes, and let 
users choose the underlying implementation (MPI or ZeroMQ) that meets their 
requirements.</p>
 <p>Singa&#x2019;s messaging library consists of two components, namely the 
message, and the socket to send and receive messages. <b>Socket</b> refers to a 
Singa defined data structure instead of the Linux Socket. We will introduce the 
two components in detail with the following figure as an example 
architecture.</p>
 <p><img src="../images/arch/arch2.png" style="width: 550px" alt="" /> <img 
src="../images/arch/comm.png" style="width: 550px" alt="" /> 
 <p><b> Fig.1 - Example physical architecture and network connection</b></p>
-<p>Fig.1 shows an example physical architecture and its network connection. <a 
href="architecture.html}">Section-partition server side ParamShard</a> has a 
detailed description of the architecture. Each process consists of one main 
thread running the stub and multiple background threads running the worker and 
server tasks. The stub of the main thread forwards messages among threads . The 
worker and server tasks are performed by the background threads.</p>
+<p>Fig.1 shows an example physical architecture and its network connection. <a 
class="externalLink" 
href="http://singa.incubator.apache.org/docs/architecture.html}";>Section-partition
 server side ParamShard</a> has a detailed description of the architecture. 
Each process consists of one main thread running the stub and multiple 
background threads running the worker and server tasks. The stub of the main 
thread forwards messages among threads . The worker and server tasks are 
performed by the background threads.</p>
 <div class="section">
 <h3><a name="Message"></a>Message</h3>
 <p><object type="image/svg+xml" style="width: 100px" data="../images/msg.svg"> 
Not supported </object> 
@@ -799,79 +800,50 @@ class SafeQueue{
 </pre></div></div>
 <p>For inter-process communication, we serialize the message and call 
MPI&#x2019;s send/receive functions to transfer them. All inter-process 
connections are setup by MPI at the beginning. Consequently, the Connect and 
Bind functions do nothing for both inter-process and intra-process 
communication.</p>
 <p>MPI&#x2019;s AllReduce function is efficient for data aggregation in 
distributed training. For example, <a class="externalLink" 
href="http://arxiv.org/abs/1501.02876";>DeepImage of Baidu</a> uses AllReduce to 
aggregate the updates of parameter from all workers. It has similar 
architecture as <a href="architecture.html">Fig.2</a>, where every process has 
a server group and is connected with all other processes. Hence, we can 
implement DeepImage in Singa by simply using MPI&#x2019;s AllReduce function 
for inter-process communication.</p>
-<!-- #### Server socket
+<p>{% comment %}</p></div>
+<div class="section">
+<h4><a name="Server_socket"></a>Server socket</h4>
+<p>Each server has a DEALER socket to communicate with the stub in the main 
thread via an <i>in-proc</i> socket. It receives requests issued from workers 
and other servers, and forwarded by the ROUTER of the stub. Since the requests 
are forwarded by the stub, we can make the location of workers transparent to 
server threads. The stub records the locations of workers and servers.</p>
+<p>As explained previously in the [APIs](<a class="externalLink" 
href="http://singa.incubator.apache.org{%";>http://singa.incubator.apache.org{%</a>
 post_url /docs/2015-03-20-parameter-management %}) for parameter management, 
some requests may not be processed immediately but have to be re-queued. For 
instance, the Get request cannot be processed if the requested parameter is not 
available, i.e., the parameter has not been put into the server&#x2019;s 
ParamShard. The re-queueing operation is implemented sendings the messages to 
the ROUTER socket of the stub which treats the message as a newly arrived 
request and queues it for processing.</p></div>
+<div class="section">
+<h4><a name="Worker_socket"></a>Worker socket</h4>
+<p>Each worker thread has a DEALER socket to communicate with the stub in the 
main thread via an <i>in-proc</i> socket. It sends (Get/Update) requests to the 
ROUTER in the stub which forwards the request to (local or remote) processes. 
In case of the partition of ParamShard of worker side, it may also transfer 
data with other workers via the DEALER socket. Again, the location of the other 
side (a server or worker) of the communication is transparent to the worker. 
The stub handles the addressing.</p>
+<p>PMClient executes the training logic, during which it generates GET and 
UPDATE requests. A request received at the worker&#x2019;s main thread contains 
ID of the PMClient instance. The worker determines which server to send the 
request based on its content, then sends it via the corresponding socket. 
Response messages received from any of the server socket are forwarded to the 
in-proc ROUTER socket. Since each response header contains the PMClient ID, it 
is routed to the correct instance.</p></div>
+<div class="section">
+<h4><a name="Stub_sockets"></a>Stub sockets</h4>
+<div class="section">
+<h5><a name="ROUTER_socket"></a>ROUTER socket</h5>
+<p>The main thread has a ROUTER socket to communicate with background 
threads.</p>
+<p>It forwards the requests from workers to background servers. There can be 
multiple servers.If all servers maintain the same (sub) ParamShard, then the 
request can be forwarded to any of them. Load-balance (like round-robin) can be 
implemented in the stub to improve the performance. If each server maintains a 
sub-set of the local ParamShard, then the stub forwards each request to the 
corresponding server. It also forwards the synchronization requests from remote 
servers to local servers in the same way.</p>
+<p>In the case of neural network partition (i.e., model partition), neighbor 
layers would transfer data with each other. Hence, the ROUTER would forwards 
data transfer requests from one worker to other worker. The stub looks up the 
location table to decide where to forward each request.</p></div>
+<div class="section">
+<h5><a name="DEALER_sockets"></a>DEALER sockets</h5>
+<p>The main thread has multiple DEALER sockets to communicate with other 
processes, one socket per process. Two processes are connected if one of the 
following cases exists:</p>
+
+<ul>
+  
+<li>one worker group spans across the two processes;</li>
+  
+<li>two connected server groups are separated in the two processes;</li>
+  
+<li>workers and the subscribed servers are separated in the two processes.</li>
+</ul>
+<p>All messages in SINGA are of multi-frame ZeroMQ format. The figure above 
demonstrates different types of messages exchanged in the system.</p>
 
-Each server has a DEALER socket to communicate with the stub in the main
-thread via an _in-proc_ socket. It receives requests issued from workers and
-other servers, and forwarded by the ROUTER of the stub. Since the requests are 
forwarded by the
-stub, we can make the location of workers transparent to server threads. The
-stub records the locations of workers and servers.
-
-As explained previously in the
-[APIs]({{ BASE_PATH }}{% post_url /docs/2015-03-20-parameter-management %})
-for parameter management, some requests may
-not be processed immediately but have to be re-queued. For instance, the Get
-request cannot be processed if the requested parameter is not available, i.e.,
-the parameter has not been put into the server's ParamShard. The re-queueing
-operation is implemented sendings the messages to the ROUTER
-socket of the stub which treats the message as a newly arrived request
-and queues it for processing.
-
-#### Worker socket
-
-Each worker thread has a DEALER socket to communicate with the stub in the main
-thread via an _in-proc_ socket. It sends (Get/Update) requests to the ROUTER in
-the stub which forwards the request to (local or remote) processes. In case of
-the partition of ParamShard of worker side, it may also transfer data with 
other
-workers via the DEALER socket. Again, the location of the other side (a server
-or worker) of the communication is transparent to the worker. The stub handles
-the addressing.
-
-PMClient executes the training logic, during which it generates GET and UPDATE
-requests. A request received at the worker's main thread contains ID of the
-PMClient instance. The worker determines which server to send the request based
-on its content, then sends it via the corresponding socket. Response messages
-received from any of the server socket are forwarded to the in-proc ROUTER
-socket. Since each response header contains the PMClient ID, it is routed to
-the correct instance.
-
-#### Stub sockets
-
-##### ROUTER socket
-The main thread has a ROUTER socket to communicate with background threads.
-
-It forwards the requests from workers to background servers. There can be
-multiple servers.If all servers maintain the same (sub) ParamShard, then the
-request can be forwarded to any of them. Load-balance (like round-robin) can be
-implemented in the stub to improve the performance. If each server maintains a
-sub-set of the local ParamShard, then the stub forwards each request to the
-corresponding server.  It also forwards the synchronization requests from
-remote servers to local servers in the same way.
-
-In the case of neural network partition (i.e., model partition), neighbor
-layers would transfer data with each other. Hence, the ROUTER would forwards
-data transfer requests from one worker to other worker. The stub looks up the
-location table to decide where to forward each request.
-
-##### DEALER sockets
-
-The main thread has multiple DEALER sockets to communicate with other
-processes, one socket per process. Two processes are connected if one of the
-following cases exists:
-
-  * one worker group spans across the two processes;
-  * two connected server groups are separated in the two processes;
-  * workers and the subscribed servers are separated in the two processes.
-
-
-All messages in SINGA are of multi-frame ZeroMQ format. The figure above 
demonstrates different types of messages exchanged in the system.
-
-  1. Requests generated by PMClient consist of the parameter content (which 
could be empty), followed by the parameter ID (key) and the request type 
(GET/PUT/REQUEST). Responses received by PMClient are also of this format.
-  2. Messages received by the worker's main thread from PMClient instances 
contain another frame identifying the PMClient connection (or PMClient ID).
-  3. Requests originating form a worker and arriving at the server contain 
another frame identifying the worker's connection (or Worker ID).
-  4. Requests originating from another server and arriving at the server have 
the same format as (3), but the first frame identifies the server connection 
(or Server ID).
-  5. After a PMServer processes a request, it generates a message with the 
format similar to (3) but with extra frame indicating if the message is to be 
routed back to a worker (a response message) or to route to another server (a 
SYNC request).
-  6. When a request is re-queued, the PMServer generates a message and sends 
it directly to the server's front-end socket. The re-queued request seen by the 
server's main thread consists of all the frames in (3), followed by a REQUEUED 
frame, and finally by another frame generated by the ROUTER socket identifying 
connection from the PMServer instance. The main thread then strips off these 
additional two frames before  forwarding it to another PMServer instance like 
another ordinary request. --></div></div></div>
+<ol style="list-style-type: decimal">
+  
+<li>Requests generated by PMClient consist of the parameter content (which 
could be empty), followed by the parameter ID (key) and the request type 
(GET/PUT/REQUEST). Responses received by PMClient are also of this format.</li>
+  
+<li>Messages received by the worker&#x2019;s main thread from PMClient 
instances contain another frame identifying the PMClient connection (or 
PMClient ID).</li>
+  
+<li>Requests originating form a worker and arriving at the server contain 
another frame identifying the worker&#x2019;s connection (or Worker ID).</li>
+  
+<li>Requests originating from another server and arriving at the server have 
the same format as (3), but the first frame identifies the server connection 
(or Server ID).</li>
+  
+<li>After a PMServer processes a request, it generates a message with the 
format similar to (3) but with extra frame indicating if the message is to be 
routed back to a worker (a response message) or to route to another server (a 
SYNC request).</li>
+  
+<li>When a request is re-queued, the PMServer generates a message and sends it 
directly to the server&#x2019;s front-end socket. The re-queued request seen by 
the server&#x2019;s main thread consists of all the frames in (3), followed by 
a REQUEUED frame, and finally by another frame generated by the ROUTER socket 
identifying connection from the PMServer instance. The main thread then strips 
off these additional two frames before forwarding it to another PMServer 
instance like another ordinary request. {% endcomment %}</li>
+</ol></div></div></div></div>
                   </div>
             </div>
           </div>

Added: websites/staging/singa/trunk/content/docs/data.html
==============================================================================
    (empty)

svn commit: r964001 - in /websites/staging/singa/trunk/content: ./ community/ develop/ docs/

Reply via email to