Author: buildbot
Date: Wed Sep 2 08:15:15 2015
New Revision: 964001
Log:
Staging update by buildbot for singa
Added:
websites/staging/singa/trunk/content/docs/data.html
Modified:
websites/staging/singa/trunk/content/ (props changed)
websites/staging/singa/trunk/content/community.html
websites/staging/singa/trunk/content/community/issue-tracking.html
websites/staging/singa/trunk/content/community/mail-lists.html
websites/staging/singa/trunk/content/community/source-repository.html
websites/staging/singa/trunk/content/community/team-list.html
websites/staging/singa/trunk/content/develop/contribute-code.html
websites/staging/singa/trunk/content/develop/contribute-docs.html
websites/staging/singa/trunk/content/develop/how-contribute.html
websites/staging/singa/trunk/content/develop/schedule.html
websites/staging/singa/trunk/content/docs.html
websites/staging/singa/trunk/content/docs/architecture.html
websites/staging/singa/trunk/content/docs/checkpoint.html
websites/staging/singa/trunk/content/docs/cnn.html
websites/staging/singa/trunk/content/docs/code-structure.html
websites/staging/singa/trunk/content/docs/communication.html
Propchange: websites/staging/singa/trunk/content/
------------------------------------------------------------------------------
--- cms:source-revision (original)
+++ cms:source-revision Wed Sep 2 08:15:15 2015
@@ -1 +1 @@
-1696297
+1700726
Modified: websites/staging/singa/trunk/content/community.html
==============================================================================
--- websites/staging/singa/trunk/content/community.html (original)
+++ websites/staging/singa/trunk/content/community.html Wed Sep 2 08:15:15 2015
@@ -1,13 +1,13 @@
<!DOCTYPE html>
<!--
- | Generated by Apache Maven Doxia at 2015-08-17
+ | Generated by Apache Maven Doxia at 2015-09-02
| Rendered using Apache Maven Fluido Skin 1.4
-->
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
<meta charset="UTF-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
- <meta name="Date-Revision-yyyymmdd" content="20150817" />
+ <meta name="Date-Revision-yyyymmdd" content="20150902" />
<meta http-equiv="Content-Language" content="en" />
<title>Apache SINGA – Community</title>
<link rel="stylesheet" href="./css/apache-maven-fluido-1.4.min.css" />
Modified: websites/staging/singa/trunk/content/community/issue-tracking.html
==============================================================================
--- websites/staging/singa/trunk/content/community/issue-tracking.html
(original)
+++ websites/staging/singa/trunk/content/community/issue-tracking.html Wed Sep
2 08:15:15 2015
@@ -1,13 +1,13 @@
<!DOCTYPE html>
<!--
- | Generated by Apache Maven Doxia at 2015-08-17
+ | Generated by Apache Maven Doxia at 2015-09-02
| Rendered using Apache Maven Fluido Skin 1.4
-->
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
<meta charset="UTF-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
- <meta name="Date-Revision-yyyymmdd" content="20150817" />
+ <meta name="Date-Revision-yyyymmdd" content="20150902" />
<meta http-equiv="Content-Language" content="en" />
<title>Apache SINGA – Issue Tracking</title>
<link rel="stylesheet" href="../css/apache-maven-fluido-1.4.min.css" />
Modified: websites/staging/singa/trunk/content/community/mail-lists.html
==============================================================================
--- websites/staging/singa/trunk/content/community/mail-lists.html (original)
+++ websites/staging/singa/trunk/content/community/mail-lists.html Wed Sep 2
08:15:15 2015
@@ -1,13 +1,13 @@
<!DOCTYPE html>
<!--
- | Generated by Apache Maven Doxia at 2015-08-17
+ | Generated by Apache Maven Doxia at 2015-09-02
| Rendered using Apache Maven Fluido Skin 1.4
-->
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
<meta charset="UTF-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
- <meta name="Date-Revision-yyyymmdd" content="20150817" />
+ <meta name="Date-Revision-yyyymmdd" content="20150902" />
<meta http-equiv="Content-Language" content="en" />
<title>Apache SINGA – Project Mailing Lists</title>
<link rel="stylesheet" href="../css/apache-maven-fluido-1.4.min.css" />
Modified: websites/staging/singa/trunk/content/community/source-repository.html
==============================================================================
--- websites/staging/singa/trunk/content/community/source-repository.html
(original)
+++ websites/staging/singa/trunk/content/community/source-repository.html Wed
Sep 2 08:15:15 2015
@@ -1,13 +1,13 @@
<!DOCTYPE html>
<!--
- | Generated by Apache Maven Doxia at 2015-08-17
+ | Generated by Apache Maven Doxia at 2015-09-02
| Rendered using Apache Maven Fluido Skin 1.4
-->
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
<meta charset="UTF-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
- <meta name="Date-Revision-yyyymmdd" content="20150817" />
+ <meta name="Date-Revision-yyyymmdd" content="20150902" />
<meta http-equiv="Content-Language" content="en" />
<title>Apache SINGA – Source Repository</title>
<link rel="stylesheet" href="../css/apache-maven-fluido-1.4.min.css" />
Modified: websites/staging/singa/trunk/content/community/team-list.html
==============================================================================
--- websites/staging/singa/trunk/content/community/team-list.html (original)
+++ websites/staging/singa/trunk/content/community/team-list.html Wed Sep 2
08:15:15 2015
@@ -1,13 +1,13 @@
<!DOCTYPE html>
<!--
- | Generated by Apache Maven Doxia at 2015-08-17
+ | Generated by Apache Maven Doxia at 2015-09-02
| Rendered using Apache Maven Fluido Skin 1.4
-->
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
<meta charset="UTF-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
- <meta name="Date-Revision-yyyymmdd" content="20150817" />
+ <meta name="Date-Revision-yyyymmdd" content="20150902" />
<meta http-equiv="Content-Language" content="en" />
<title>Apache SINGA – The SINGA Team</title>
<link rel="stylesheet" href="../css/apache-maven-fluido-1.4.min.css" />
Modified: websites/staging/singa/trunk/content/develop/contribute-code.html
==============================================================================
--- websites/staging/singa/trunk/content/develop/contribute-code.html (original)
+++ websites/staging/singa/trunk/content/develop/contribute-code.html Wed Sep
2 08:15:15 2015
@@ -1,13 +1,13 @@
<!DOCTYPE html>
<!--
- | Generated by Apache Maven Doxia at 2015-08-17
+ | Generated by Apache Maven Doxia at 2015-09-02
| Rendered using Apache Maven Fluido Skin 1.4
-->
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
<meta charset="UTF-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
- <meta name="Date-Revision-yyyymmdd" content="20150817" />
+ <meta name="Date-Revision-yyyymmdd" content="20150902" />
<meta http-equiv="Content-Language" content="en" />
<title>Apache SINGA – How to Contribute Code</title>
<link rel="stylesheet" href="../css/apache-maven-fluido-1.4.min.css" />
Modified: websites/staging/singa/trunk/content/develop/contribute-docs.html
==============================================================================
--- websites/staging/singa/trunk/content/develop/contribute-docs.html (original)
+++ websites/staging/singa/trunk/content/develop/contribute-docs.html Wed Sep
2 08:15:15 2015
@@ -1,13 +1,13 @@
<!DOCTYPE html>
<!--
- | Generated by Apache Maven Doxia at 2015-08-17
+ | Generated by Apache Maven Doxia at 2015-09-02
| Rendered using Apache Maven Fluido Skin 1.4
-->
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
<meta charset="UTF-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
- <meta name="Date-Revision-yyyymmdd" content="20150817" />
+ <meta name="Date-Revision-yyyymmdd" content="20150902" />
<meta http-equiv="Content-Language" content="en" />
<title>Apache SINGA – How to Contribute Documentation</title>
<link rel="stylesheet" href="../css/apache-maven-fluido-1.4.min.css" />
Modified: websites/staging/singa/trunk/content/develop/how-contribute.html
==============================================================================
--- websites/staging/singa/trunk/content/develop/how-contribute.html (original)
+++ websites/staging/singa/trunk/content/develop/how-contribute.html Wed Sep 2
08:15:15 2015
@@ -1,13 +1,13 @@
<!DOCTYPE html>
<!--
- | Generated by Apache Maven Doxia at 2015-08-17
+ | Generated by Apache Maven Doxia at 2015-09-02
| Rendered using Apache Maven Fluido Skin 1.4
-->
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
<meta charset="UTF-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
- <meta name="Date-Revision-yyyymmdd" content="20150817" />
+ <meta name="Date-Revision-yyyymmdd" content="20150902" />
<meta http-equiv="Content-Language" content="en" />
<title>Apache SINGA – How to Contribute to SINGA</title>
<link rel="stylesheet" href="../css/apache-maven-fluido-1.4.min.css" />
Modified: websites/staging/singa/trunk/content/develop/schedule.html
==============================================================================
--- websites/staging/singa/trunk/content/develop/schedule.html (original)
+++ websites/staging/singa/trunk/content/develop/schedule.html Wed Sep 2
08:15:15 2015
@@ -1,13 +1,13 @@
<!DOCTYPE html>
<!--
- | Generated by Apache Maven Doxia at 2015-08-17
+ | Generated by Apache Maven Doxia at 2015-09-02
| Rendered using Apache Maven Fluido Skin 1.4
-->
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
<meta charset="UTF-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
- <meta name="Date-Revision-yyyymmdd" content="20150817" />
+ <meta name="Date-Revision-yyyymmdd" content="20150902" />
<meta http-equiv="Content-Language" content="en" />
<title>Apache SINGA – Development Schedule</title>
<link rel="stylesheet" href="../css/apache-maven-fluido-1.4.min.css" />
Modified: websites/staging/singa/trunk/content/docs.html
==============================================================================
--- websites/staging/singa/trunk/content/docs.html (original)
+++ websites/staging/singa/trunk/content/docs.html Wed Sep 2 08:15:15 2015
@@ -1,13 +1,13 @@
<!DOCTYPE html>
<!--
- | Generated by Apache Maven Doxia at 2015-08-17
+ | Generated by Apache Maven Doxia at 2015-09-02
| Rendered using Apache Maven Fluido Skin 1.4
-->
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
<meta charset="UTF-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
- <meta name="Date-Revision-yyyymmdd" content="20150817" />
+ <meta name="Date-Revision-yyyymmdd" content="20150902" />
<meta http-equiv="Content-Language" content="en" />
<title>Apache SINGA – Documentation</title>
<link rel="stylesheet" href="./css/apache-maven-fluido-1.4.min.css" />
Modified: websites/staging/singa/trunk/content/docs/architecture.html
==============================================================================
--- websites/staging/singa/trunk/content/docs/architecture.html (original)
+++ websites/staging/singa/trunk/content/docs/architecture.html Wed Sep 2
08:15:15 2015
@@ -1,15 +1,15 @@
<!DOCTYPE html>
<!--
- | Generated by Apache Maven Doxia at 2015-08-17
+ | Generated by Apache Maven Doxia at 2015-09-02
| Rendered using Apache Maven Fluido Skin 1.4
-->
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
<meta charset="UTF-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
- <meta name="Date-Revision-yyyymmdd" content="20150817" />
+ <meta name="Date-Revision-yyyymmdd" content="20150902" />
<meta http-equiv="Content-Language" content="en" />
- <title>Apache SINGA – System Architecture</title>
+ <title>Apache SINGA – </title>
<link rel="stylesheet" href="../css/apache-maven-fluido-1.4.min.css" />
<link rel="stylesheet" href="../css/site.css" />
<link rel="stylesheet" href="../css/print.css" media="print" />
@@ -189,7 +189,7 @@
Apache SINGA</a>
<span class="divider">/</span>
</li>
- <li class="active ">System Architecture</li>
+ <li class="active "></li>
@@ -423,14 +423,15 @@
<div id="bodyColumn" class="span10" >
- <div class="section">
-<h2><a name="System_Architecture"></a>System Architecture</h2>
-<hr />
+ <p>— layout: post title: Architecture category : docs</p>
+<div class="section">
+<h2><a name="tags_:_architecture"></a>tags : [architecture]</h2>
+<p>{% include JB/setup %}</p>
<div class="section">
<h3><a name="Logical_Architecture"></a>Logical Architecture</h3>
-<p><img src="../images/distributed/logical.png" style="width: 550px" alt="" />
+<p><img src="http://singa.incubator.apache.org/assets/image/logical.png"
style="width: 550px" alt="" />
<p><b> Fig.1 - Logical system architecture</b></p>
-<p>SINGA has flexible architecture to support different distributed <a
href="frameworks.html">training frameworks</a> (both synchronous and
asynchronous). The logical system architecture is shown in Fig.1. The
architecture consists of multiple server groups and worker groups:</p>
+<p>SINGA has flexible architecture to support different distributed <a
class="externalLink"
href="http://singa.incubator.apache.org/docs/frameworks.html">training
frameworks</a> (both synchronous and asynchronous). The logical system
architecture is shown in Fig.1. The architecture consists of multiple server
groups and worker groups:</p>
<ul>
@@ -438,7 +439,7 @@
<li><b>Worker group</b> Each worker group communicates with only one server
group. A worker group trains a complete model replica against a partition of
the training dataset, and is responsible for computing parameter gradients.
All worker groups run and communicate with the corresponding server groups
asynchronously. However, inside each worker group, the workers synchronously
compute parameter updates for the model replica.</li>
</ul>
-<p>There are different strategies to distribute the training workload among
workers within a group: </p>
+<p>There are different strategies to distribute the training workload among
workers within a group:</p>
<ul>
@@ -450,7 +451,7 @@
</ul></div>
<div class="section">
<h3><a name="Implementation"></a>Implementation</h3>
-<p>In SINGA, servers and workers are execution units running in separate
threads. They communicate through <a href="communication.html">messages</a>.
Every process runs the main thread as a stub that aggregates local messages and
forwards them to corresponding (remote) receivers.</p>
+<p>In SINGA, servers and workers are execution units running in separate
threads. They communicate through <a class="externalLink"
href="http://singa.incubator.apache.org/docs/communication.html">messages</a>.
Every process runs the main thread as a stub that aggregates local messages and
forwards them to corresponding (remote) receivers.</p>
<p>Each server group and worker group have a <i>ParamShard</i> object
representing a complete model replica. If workers and servers resident in the
same process, their <i>ParamShard</i> (partitions) can be configured to share
the same memory space. In this case, the messages transferred between different
execution units just contain pointers to the data, which reduces the
communication cost. Unlike in inter-process cases, the messages have to include
the parameter values.</p></div></div>
</div>
</div>
Modified: websites/staging/singa/trunk/content/docs/checkpoint.html
==============================================================================
--- websites/staging/singa/trunk/content/docs/checkpoint.html (original)
+++ websites/staging/singa/trunk/content/docs/checkpoint.html Wed Sep 2
08:15:15 2015
@@ -1,15 +1,15 @@
<!DOCTYPE html>
<!--
- | Generated by Apache Maven Doxia at 2015-08-17
+ | Generated by Apache Maven Doxia at 2015-09-02
| Rendered using Apache Maven Fluido Skin 1.4
-->
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
<meta charset="UTF-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
- <meta name="Date-Revision-yyyymmdd" content="20150817" />
+ <meta name="Date-Revision-yyyymmdd" content="20150902" />
<meta http-equiv="Content-Language" content="en" />
- <title>Apache SINGA – Checkpoint and Resume</title>
+ <title>Apache SINGA – </title>
<link rel="stylesheet" href="../css/apache-maven-fluido-1.4.min.css" />
<link rel="stylesheet" href="../css/site.css" />
<link rel="stylesheet" href="../css/print.css" media="print" />
@@ -189,7 +189,7 @@
Apache SINGA</a>
<span class="divider">/</span>
</li>
- <li class="active ">Checkpoint and Resume</li>
+ <li class="active "></li>
@@ -423,80 +423,84 @@
<div id="bodyColumn" class="span10" >
- <div class="section">
-<h2><a name="Checkpoint_and_Resume"></a>Checkpoint and Resume</h2>
-<hr />
+ <p>— layout: post title: Checkpoint and Resume category :
docs</p>
<div class="section">
-<h3><a name="Applications_of_checkpoint"></a>Applications of checkpoint</h3>
-<p>By taking checkpoints of model parameters, we can</p>
+<h2><a name="tags_:_checkpoint_restore"></a>tags : [checkpoint, restore]</h2>
+<p>{% include JB/setup %}</p>
+<p>SINGA checkpoints model parameters onto disk periodically according to user
configured frequency. By checkpointing model parameters, we can</p>
<ol style="list-style-type: decimal">
<li>
-<p>Restore (resume) the training from the last checkpoint. For example, if the
program crashes before finishing all training steps.</p></li>
+<p>resume the training from the last checkpointing. For example, if the
program crashes before finishing all training steps, we can continue the
training using checkpoint files.</p></li>
<li>
-<p>Use them as pre-training results for a similar model. For example, the
parameters from training a RBM model can be used to initialize a <a
href="auto-encoder.html">deep auto-encoder</a> model.</p></li>
+<p>use them to initialize a similar model. For example, the parameters from
training a RBM model can be used to initialize a <a class="externalLink"
href="http://singa.incubator.apache.org/docs/rbm">deep auto-encoder</a>
model.</p></li>
</ol></div>
<div class="section">
-<h3><a name="Instructions_for_checkpoint_and_resume"></a>Instructions for
checkpoint and resume</h3>
-<p>Checkpoint is controlled by two model configuration fields:
<tt>checkpoint_after</tt> (start checkpoint after this number of training
steps) and <tt>checkpoint_frequency</tt>. The checkpoint files are located at
<tt>WORKSPACE/checkpoint/stepSTEP-workerWORKERID.bin</tt>.</p>
-<p>The following configuration shows an example,</p>
+<h2><a name="Configuration"></a>Configuration</h2>
+<p>Checkpointing is controlled by two configuration fields:</p>
+
+<ul>
+
+<li><tt>checkpoint_after</tt>, start checkpointing after this number of
training steps,</li>
+
+<li><tt>checkpoint_freq</tt>, frequency of doing checkpointing.</li>
+</ul>
+<p>For example,</p>
<div class="source">
-<div class="source"><pre class="prettyprint">model {
- ...
- checkpoint_after: 100
- checkpoint_frequency: 300
- ...
-}
+<div class="source"><pre class="prettyprint"># job.conf
+workspace: "WORKSPACE"
+checkpoint_after: 100
+checkpoint_frequency: 300
+...
</pre></div></div>
-<p>After training for 700 steps, under WORKSPACE/checkpoint folder, there
would be two checkpoint files (training on single node):</p>
+<p>Checkpointing files are located at
<i>WORKSPACE/checkpoint/stepSTEP-workerWORKERID.bin</i>. For the above
configuration, after training for 700 steps, there would be two checkpointing
files,</p>
<div class="source">
<div class="source"><pre class="prettyprint">step400-worker0.bin
step700-worker0.bin
-</pre></div></div>
+</pre></div></div></div>
<div class="section">
-<h4><a name="Application_1"></a>Application 1</h4>
-<p>We can resume the training from the last checkpoint (i.e., step 700) by:</p>
+<h2><a name="Application_-_resuming_training"></a>Application - resuming
training</h2>
+<p>We can resume the training from the last checkpoint (i.e., step 700) by,</p>
<div class="source">
-<div class="source"><pre class="prettyprint">./bin/singa-run.sh
-workspace=WORKSPACE --resume
-</pre></div></div></div>
+<div class="source"><pre class="prettyprint">./bin/singa-run.sh -conf JOB_CONF
-resume
+</pre></div></div>
+<p>There is no change to the job configuration.</p></div>
<div class="section">
-<h4><a name="Application_2"></a>Application 2</h4>
-<p>We can also use the checkpoint file from step 400 as the pre-trained model
for a new model by configuring the job.conf of the new model as:</p>
+<h2><a name="Application_-_model_initialization"></a>Application - model
initialization</h2>
+<p>We can also use the checkpointing file from step 400 to initialize a new
model by configuring the new job as,</p>
<div class="source">
-<div class="source"><pre class="prettyprint">model {
- ...
- checkpoint : "WORKSPACE/checkpoint/step400-worker0.bin"
- ...
-}
+<div class="source"><pre class="prettyprint"># job.conf
+checkpoint : "WORKSPACE/checkpoint/step400-worker0.bin"
+...
</pre></div></div>
-<p>If there are multiple checkpoint files for the same snapshot due to model
partitioning, all the checkpoint files should be added:</p>
+<p>If there are multiple checkpointing files for the same snapshot due to
model partitioning, all the checkpointing files should be added,</p>
<div class="source">
-<div class="source"><pre class="prettyprint">model {
- ...
- checkpoint : "WORKSPACE/checkpoint/step400-worker0.bin"
- checkpoint : "WORKSPACE/checkpoint/step400-worker1.bin"
- ...
-}
+<div class="source"><pre class="prettyprint"># job.conf
+checkpoint : "WORKSPACE/checkpoint/step400-worker0.bin"
+checkpoint : "WORKSPACE/checkpoint/step400-worker1.bin"
+...
</pre></div></div>
-<p>The launching command is the same as starting a new job</p>
+<p>The training command is the same as starting a new job,</p>
<div class="source">
-<div class="source"><pre class="prettyprint">./bin/singa-run.sh
-workspace=WORKSPACE
-</pre></div></div></div></div>
+<div class="source"><pre class="prettyprint">./bin/singa-run.sh -conf JOB_CONF
+</pre></div></div>
+<p>{% comment %}</p></div>
<div class="section">
-<h3><a name="Implementation_details"></a>Implementation details</h3>
-<p>The checkpoint is done in the Worker class and controlled by two model
configuration fields: <tt>checkpoint_after</tt> and
<tt>checkpoint_frequency</tt>. Only Params owning the param values from the
first group are dumped onto into checkpoint files. For one Param object, its
name, version and values are saved. It is possible that the snapshot is
separated into multiple files because the neural net is partitioned into
multiple workers.</p>
-<p>The Worker’s InitLocalParam will initialize Params from checkpoint
files if the <tt>checkpoint</tt> field is set. Otherwise it randomly initialize
them using user configured initialization method. The Param objects are matched
based on name. If the Param is not configured with a name, NeuralNet class will
automatically create one for it based on the name of the layer to which the
Param object belongs. The <tt>checkpoint</tt> can be set by users (Application
1) or by the Resume function (Application 2) of the Trainer class, which finds
the files for the latest snapshot and add them to the <tt>checkpoint</tt>
filed. It also sets the <tt>step</tt> field of model configuration to the
checkpoint step (extracted from file name).</p></div>
+<h2><a name="Advanced_user_guide"></a>Advanced user guide</h2>
+<p>Checkpointing is done in the <a class="externalLink"
href="http://singa.incubator.apache.org/api/classsinga_1_1Worker.html">Worker
class</a>. Only <tt>Param</tt>s from the first group are dumped into
checkpointing files. For a <tt>Param</tt> object, its name, version and values
are saved. It is possible that the snapshot is separated into multiple files
because the neural net is partitioned into multiple workers.</p>
+<p>The Worker’s <tt>InitLocalParam</tt> function will initialize
parameters from checkpointing files if the <tt>checkpoint</tt> field is set.
Otherwise it randomly initialize them using user configured initialization
method. The <tt>Param</tt> objects are matched based on name. If a
<tt>Param</tt> object is not configured with a name, <tt>NeuralNet</tt> class
will automatically create one for it based on the name of the layer. The
<tt>checkpoint</tt> can be set by users (Application 1) or by the
<tt>Resume</tt> function (Application 2) of the Trainer class, which finds the
files for the latest snapshot and add them to the <tt>checkpoint</tt> filed. It
also sets the <tt>step</tt> field of model configuration to the checkpoint step
(extracted from file name).</p>
<div class="section">
<h3><a name="Caution"></a>Caution</h3>
-<p>Both two applications must be taken carefully when Param objects are
partitioned due to model partitioning. Because if the training is done using 2
workers, while the new model (or continue training) is trained with 3 workers,
then the same original Param object is partitioned in different ways and hence
cannot be matched.</p></div></div>
+<p>Both two applications must be taken carefully when <tt>Param</tt> objects
are partitioned due to model partitioning. Because if the training is done
using 2 workers, while the new model (or continue training) is trained with 3
workers, then the same original <tt>Param</tt> object is partitioned in
different ways and hence cannot be matched.</p>
+<p>{% endcomment %}</p></div></div>
</div>
</div>
</div>
Modified: websites/staging/singa/trunk/content/docs/cnn.html
==============================================================================
--- websites/staging/singa/trunk/content/docs/cnn.html (original)
+++ websites/staging/singa/trunk/content/docs/cnn.html Wed Sep 2 08:15:15 2015
@@ -1,13 +1,13 @@
<!DOCTYPE html>
<!--
- | Generated by Apache Maven Doxia at 2015-08-17
+ | Generated by Apache Maven Doxia at 2015-09-02
| Rendered using Apache Maven Fluido Skin 1.4
-->
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
<meta charset="UTF-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
- <meta name="Date-Revision-yyyymmdd" content="20150817" />
+ <meta name="Date-Revision-yyyymmdd" content="20150902" />
<meta http-equiv="Content-Language" content="en" />
<title>Apache SINGA – </title>
<link rel="stylesheet" href="../css/apache-maven-fluido-1.4.min.css" />
@@ -21,7 +21,7 @@
<script type="text/javascript"
src="../js/apache-maven-fluido-1.4.min.js"></script>
- <meta name="Notice" content="Licensed to the Apache Software Foundation
(ASF) under one or more contributor license agreements. See the
NOTICE file distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at .
http://www.apache.org/licenses/LICENSE-2.0 . Unless
required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS"
BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express
or implied. See the License for the specific language governing
permissions and limitations under the License." /> </hea
d>
+ </head>
<body class="topBarEnabled">
@@ -425,62 +425,258 @@
<div id="bodyColumn" class="span10" >
-
-<p>This example will show you how to use SINGA to train a CNN model using
cifar10 dataset.</p>
+ <p>— layout: post title: Example — Convolution
Neural Network category : docs</p>
<div class="section">
+<h2><a name="tags_:_cnn_example"></a>tags : [cnn, example]</h2>
+<p>{% include JB/setup %}</p>
+<p>Convolutional neural network (CNN) is a type of feed-forward artificial
neural network widely used for image and video classification. In this example,
we will use a deep CNN model to do image classification for the <a
class="externalLink" href="http://www.cs.toronto.edu/~kriz/cifar.html">CIFAR10
dataset</a>.</p></div>
<div class="section">
-<h3><a name="Prepare_for_the_data"></a>Prepare for the data</h3>
+<h2><a name="Running_instructions"></a>Running instructions</h2>
+<p>Please refer to the <a class="externalLink"
href="http://singa.incubator.apache.org/docs/installation">installation</a>
page for instructions on building SINGA, and the <a class="externalLink"
href="http://singa.incubator.apache.org/docs/quick-start">quick start</a> for
instructions on starting zookeeper.</p>
+<p>We have provided scripts for preparing the training and test dataset in
<i>examples/cifar10/</i>.</p>
-<ul>
-
-<li>First go to the <tt>example/cifar10/</tt> folder for preparing the
dataset. There should be a makefile example called Makefile.example in the
folder. Run the command <tt>cp Makefile.example Makefile</tt> to generate the
makefile. Then run the command <tt>make download</tt> and <tt>make create</tt>
in the current folder to download cifar10 dataset and prepare for the training
and testing datashard.</li>
-</ul></div>
+<div class="source">
+<div class="source"><pre class="prettyprint"># in examples/cifar10
+$ cp Makefile.example Makefile
+$ make download
+$ make create
+</pre></div></div>
+<p>After the datasets are prepared, we start the training by</p>
+
+<div class="source">
+<div class="source"><pre class="prettyprint">./bin/singa-run.sh -conf
examples/cifar10/job.conf
+</pre></div></div>
+<p>After it is started, you should see output like</p>
+
+<div class="source">
+<div class="source"><pre class="prettyprint">Record job information to
/tmp/singa-log/job-info/job-2-20150817-055601
+Executing : ./singa -conf /xxx/incubator-singa/examples/cifar10/job.conf
-singa_conf /xxx/incubator-singa/conf/singa.conf -singa_job 2
+E0817 06:56:18.868259 33849 cluster.cc:51] proc #0 -> 192.168.5.128:49152
(pid = 33849)
+E0817 06:56:18.928452 33871 server.cc:36] Server (group = 0, id = 0) start
+E0817 06:56:18.928469 33872 worker.cc:134] Worker (group = 0, id = 0) start
+E0817 06:57:13.657302 33849 trainer.cc:373] Test step-0, loss : 2.302588,
accuracy : 0.077900
+E0817 06:57:17.626708 33849 trainer.cc:373] Train step-0, loss : 2.302578,
accuracy : 0.062500
+E0817 06:57:24.142645 33849 trainer.cc:373] Train step-30, loss : 2.302404,
accuracy : 0.131250
+E0817 06:57:30.813354 33849 trainer.cc:373] Train step-60, loss : 2.302248,
accuracy : 0.156250
+E0817 06:57:37.556655 33849 trainer.cc:373] Train step-90, loss : 2.301849,
accuracy : 0.175000
+E0817 06:57:44.971276 33849 trainer.cc:373] Train step-120, loss : 2.301077,
accuracy : 0.137500
+E0817 06:57:51.801949 33849 trainer.cc:373] Train step-150, loss : 2.300410,
accuracy : 0.135417
+E0817 06:57:58.682281 33849 trainer.cc:373] Train step-180, loss : 2.300067,
accuracy : 0.127083
+E0817 06:58:05.578366 33849 trainer.cc:373] Train step-210, loss : 2.300143,
accuracy : 0.154167
+E0817 06:58:12.518497 33849 trainer.cc:373] Train step-240, loss : 2.295912,
accuracy : 0.185417
+</pre></div></div>
+<p>After the training of some steps (depends on the setting) or the job is
finished, SINGA will <a class="externalLink"
href="http://singa.incubator.apache.org/docs/checkpoint">checkpoint</a> the
model parameters.</p></div>
+<div class="section">
+<h2><a name="Details"></a>Details</h2>
+<p>To train a model in SINGA, you need to prepare the datasets, and a job
configuration which specifies the neural net structure, training algorithm (BP
or CD), SGD update algorithm (e.g. Adagrad), number of training/test steps,
etc.</p>
+<div class="section">
+<h3><a name="Data_preparation"></a>Data preparation</h3>
+<p>Before using SINGA, you need to write a program to pre-process the dataset
you use to a format that SINGA can read. Please refer to the <a
class="externalLink"
href="http://singa.incubator.apache.org/docs/data#example---cifar-dataset">Data
Preparation</a> to get details about preparing this CIFAR10 dataset.</p></div>
<div class="section">
-<h3><a name="Set_job_configuration."></a>Set job configuration.</h3>
+<h3><a name="Neural_net"></a>Neural net</h3>
+<p>Figure 1 shows the net structure of the CNN model we used in this example,
which is set following <a class="externalLink"
href="https://code.google.com/p/cuda-convnet/source/browse/trunk/example-layers/layers-18pct.cfg.">this
page</a> The dashed circle represents one feature transformation stage, which
generally has four layers as shown in the figure. Sometimes the rectifier layer
and normalization layer is omitted or swapped in one stage. For this example,
there are 3 such stages.</p>
+<p>Next we follow the guide in <a class="externalLink"
href="http://singa.incubator.apache.org/docs/neural-net">neural net page</a>
and <a class="externalLink"
href="http://singa.incubator.apache.org/docs/layer">layer page</a> to write the
neural net configuration.</p>
+
+<div style="text-align: center">
+<img src="http://singa.incubator.apache.org/assets/image/cnn-example.png"
style="width: 200px" alt="" /> <br />
+<b>Figure 1 - Net structure of the CNN example.</b></img>
+</div>
<ul>
-<li>If you just want to use the training model provided in this example, you
can just use job.conf file in current directory. Fig. 1 gives an example of CNN
struture. In this example, we define a CNN model that contains 3
convolution+relu+maxpooling+normalization layers. If you want to learn more
about how it is configured, you can go to <a class="externalLink"
href="http://singa.incubator.apache.org/docs/model-config.html">Model
Configuration</a> to get details.</li>
+<li>
+<p>We configure a <a class="externalLink"
href="http://singa.incubator.apache.org/docs/layer#data-layers">data layer</a>
to read the training/testing <tt>Records</tt> from <tt>DataShard</tt>.</p>
+
+<div class="source">
+<div class="source"><pre class="prettyprint">layer{
+ name: "data"
+ type: kShardData
+ sharddata_conf {
+ path: "examples/cifar10/cifar10_train_shard"
+ batchsize: 16
+ random_skip: 5000
+ }
+ exclude: kTest # exclude this layer for the testing net
+ }
+layer{
+ name: "data"
+ type: kShardData
+ sharddata_conf {
+ path: "examples/cifar10/cifar10_test_shard"
+ batchsize: 100
+ }
+ exclude: kTrain # exclude this layer for the training net
+ }
+</pre></div></div></li>
+
+<li>
+<p>We configure two <a class="externalLink"
href="http://singa.incubator.apache.org/docs/layer#parser-layers">parser
layers</a> to extract the image feature and label from <tt>Records</tt>s loaded
by the <i>data</i> layer.</p>
+
+<div class="source">
+<div class="source"><pre class="prettyprint">layer{
+ name:"rgb"
+ type: kRGBImage
+ srclayers: "data"
+ rgbimage_conf {
+ meanfile: "examples/cifar10/image_mean.bin" # normalize image
feature
+ }
+ }
+layer{
+ name: "label"
+ type: kLabel
+ srclayers: "data"
+ }
+</pre></div></div></li>
</ul>
-<div style="text-align: center">
-<img src="../images/dcnn-cifar10.png" style="width: 280px" alt="" /> <br
/>Fig. 1: CNN example </img>
-</div></div>
-<div class="section">
-<h3><a name="Run_SINGA"></a>Run SINGA</h3>
-
<ul>
-<li>All script of SINGA should be run in the root folder of SINGA. First you
need to start the zookeeper service if zookeeper is not started. The command is
<tt>./bin/zk-service start</tt>. Then you can run the command
<tt>./bin/singa-run.sh -conf examples/cifar10/job.conf</tt> to start a SINGA
job using examples/cifar10/job.conf as the job configuration. After it is
started, you should get a screenshots like the following:</li>
+<li>
+<p>We configure layers for the feature transformation as follows (all layers
are built-in layers in SINGA; hyper-parameters of these layers are set
according to <a class="externalLink"
href="https://code.google.com/p/cuda-convnet/source/browse/trunk/example-layers/layers-18pct.cfg">Alex’s
setting</a>).</p>
+
+<div class="source">
+<div class="source"><pre class="prettyprint">layer {
+ name: "conv1"
+ type: kConvolution
+ srclayers: "rgb"
+ convolution_conf {
+ num_filters: 32
+ kernel: 5
+ stride: 1
+ pad:2
+ }
+ param {
+ name: "w1"
+ init {
+ type:kGaussian
+ std:0.0001
+ }
+ }
+ param {
+ name: "b1"
+ lr_scale:2.0
+ init {
+ type: kConstant
+ value:0
+ }
+ }
+ }
+
+ layer {
+ name: "pool1"
+ type: kPooling
+ srclayers: "conv1"
+ pooling_conf {
+ pool: MAX
+ kernel: 3
+ stride: 2
+ }
+ }
+ layer {
+ name: "relu1"
+ type: kReLU
+ srclayers:"pool1"
+ }
+ layer {
+ name: "norm1"
+ type: kLRN
+ lrn_conf {
+ local_size: 3
+ alpha: 5e-05
+ beta: 0.75
+ }
+ srclayers:"relu1"
+ }
+</pre></div></div></li>
</ul>
+<p>The configurations for another 2 stages are omitted here.</p>
+<ul>
+
+<li>
+<p>There is a <a class="externalLink"
href="http://singa.incubator.apache.org/docs/layer#innerproductlayer">inner
product layer</a> after the 3 transformation stages, which is configured with
10 output units, i.e., the number of total labels. The weight matrix param is
configured with a large weight decay scale to reduce the over-fitting.</p>
+
<div class="source">
-<div class="source"><pre class="prettyprint"> xxx@yyy:zzz/incubator-singa$
./bin/singa-run.sh -conf examples/cifar10/job.conf
- Unique JOB_ID is 2
- Record job information to /tmp/singa-log/job-info/job-2-20150817-055601
- Executing : ./singa -conf /xxx/incubator-singa/examples/cifar10/job.conf
-singa_conf /xxx/incubator-singa/conf/singa.conf -singa_job 2
- E0817 06:56:18.868259 33849 cluster.cc:51] proc #0 ->
192.168.5.128:49152 (pid = 33849)
- E0817 06:56:18.928452 33871 server.cc:36] Server (group = 0, id = 0) start
- E0817 06:56:18.928469 33872 worker.cc:134] Worker (group = 0, id = 0) start
- E0817 06:57:13.657302 33849 trainer.cc:373] Test step-0, loss : 2.302588,
accuracy : 0.077900
- E0817 06:57:17.626708 33849 trainer.cc:373] Train step-0, loss : 2.302578,
accuracy : 0.062500
- E0817 06:57:24.142645 33849 trainer.cc:373] Train step-30, loss :
2.302404, accuracy : 0.131250
- E0817 06:57:30.813354 33849 trainer.cc:373] Train step-60, loss :
2.302248, accuracy : 0.156250
- E0817 06:57:37.556655 33849 trainer.cc:373] Train step-90, loss :
2.301849, accuracy : 0.175000
- E0817 06:57:44.971276 33849 trainer.cc:373] Train step-120, loss :
2.301077, accuracy : 0.137500
- E0817 06:57:51.801949 33849 trainer.cc:373] Train step-150, loss :
2.300410, accuracy : 0.135417
- E0817 06:57:58.682281 33849 trainer.cc:373] Train step-180, loss :
2.300067, accuracy : 0.127083
- E0817 06:58:05.578366 33849 trainer.cc:373] Train step-210, loss :
2.300143, accuracy : 0.154167
- E0817 06:58:12.518497 33849 trainer.cc:373] Train step-240, loss :
2.295912, accuracy : 0.185417
-</pre></div></div>
-<p>After the training of some steps (depends on the setting) or the job is
finished, SINGA will checkpoint the current parameter. In the next time, you
can train (or use for your application) by loading the checkpoint. Please refer
to <a class="externalLink"
href="http://singa.incubator.apache.org/docs/checkpoint.html">Checkpoint</a>
for the use of checkpoint.</p></div>
+<div class="source"><pre class="prettyprint">layer {
+ name: "ip1"
+ type: kInnerProduct
+ srclayers:"pool3"
+ innerproduct_conf {
+ num_output: 10
+ }
+ param {
+ name: "w4"
+ wd_scale:250
+ init {
+ type:kGaussian
+ std:0.01
+ }
+ }
+ param {
+ name: "b4"
+ lr_scale:2.0
+ wd_scale:0
+ init {
+ type: kConstant
+ value:0
+ }
+ }
+ }
+</pre></div></div></li>
+
+<li>
+<p>The last layer is a <a class="externalLink"
href="http://singa.incubator.apache.org/docs/layer#softmaxloss">Softmax loss
layer</a></p>
+
+<div class="source">
+<div class="source"><pre class="prettyprint"> layer{
+ name: "loss"
+ type: kSoftmaxLoss
+ softmaxloss_conf{
+ topk:1
+ }
+ srclayers:"ip1"
+ srclayers: "label"
+ }
+</pre></div></div></li>
+</ul></div>
<div class="section">
-<h3><a name="Build_your_own_model"></a>Build your own model</h3>
+<h3><a name="Updater"></a>Updater</h3>
+<p>The <a class="externalLink"
href="http://singa.incubator.apache.org/docs/updater#updater">normal SGD
updater</a> is selected. The learning rate is changed like stairs, and is
configured using the <a class="externalLink"
href="http://singa.incubator.apache.org/docs/updater#kfixedstep">kFixedStep</a>
type.</p>
-<ul>
-
-<li>If you want to specify you own model, then you need to decribe it in the
job.conf file. It should contain the neurualnet structure, training
algorithm(backforward or contrastive divergence etc.), SGD update
algorithm(e.g. Adagrad), number of training/test steps and training/test
frequency, and display features and etc. SINGA will read job.conf as a Google
protobuf class <a href="../src/proto/job.proto">JobProto</a>. You can also
refer to the <a class="externalLink"
href="http://singa.incubator.apache.org/docs/programmer-guide.html">Programmer
Guide</a> to get details.</li>
-</ul></div></div>
+<div class="source">
+<div class="source"><pre class="prettyprint">updater{
+ type: kSGD
+ weight_decay:0.004
+ learning_rate {
+ type: kFixedStep
+ fixedstep_conf:{
+ step:0 # lr for step 0-60000 is 0.001
+ step:60000 # lr for step 60000-65000 is 0.0001
+ step:65000 # lr for step 650000- is 0.00001
+ step_lr:0.001
+ step_lr:0.0001
+ step_lr:0.00001
+ }
+ }
+}
+</pre></div></div></div>
+<div class="section">
+<h3><a name="TrainOneBatch_algorithm"></a>TrainOneBatch algorithm</h3>
+<p>The CNN model is a feed forward model, thus should be configured to use the
[Back-propagation algorithm]({{
BASE_PATH}}/docs/train-one-batch#back-propagation).</p>
+
+<div class="source">
+<div class="source"><pre class="prettyprint">alg: kBP
+</pre></div></div></div>
+<div class="section">
+<h3><a name="Cluster_setting"></a>Cluster setting</h3>
+<p>The following configuration set a single worker and server for training. <a
class="externalLink"
href="http://singa.incubator.apache.org/docs/frameworks">Training
frameworks</a> page introduces configurations of a couple of distributed
training frameworks.</p>
+
+<div class="source">
+<div class="source"><pre class="prettyprint">cluster {
+ nworker_groups: 1
+ nserver_groups: 1
+}
+</pre></div></div></div></div>
</div>
</div>
</div>
Modified: websites/staging/singa/trunk/content/docs/code-structure.html
==============================================================================
--- websites/staging/singa/trunk/content/docs/code-structure.html (original)
+++ websites/staging/singa/trunk/content/docs/code-structure.html Wed Sep 2
08:15:15 2015
@@ -1,13 +1,13 @@
<!DOCTYPE html>
<!--
- | Generated by Apache Maven Doxia at 2015-08-17
+ | Generated by Apache Maven Doxia at 2015-09-02
| Rendered using Apache Maven Fluido Skin 1.4
-->
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
<meta charset="UTF-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
- <meta name="Date-Revision-yyyymmdd" content="20150817" />
+ <meta name="Date-Revision-yyyymmdd" content="20150902" />
<meta http-equiv="Content-Language" content="en" />
<title>Apache SINGA – Code Structure</title>
<link rel="stylesheet" href="../css/apache-maven-fluido-1.4.min.css" />
Modified: websites/staging/singa/trunk/content/docs/communication.html
==============================================================================
--- websites/staging/singa/trunk/content/docs/communication.html (original)
+++ websites/staging/singa/trunk/content/docs/communication.html Wed Sep 2
08:15:15 2015
@@ -1,15 +1,15 @@
<!DOCTYPE html>
<!--
- | Generated by Apache Maven Doxia at 2015-08-17
+ | Generated by Apache Maven Doxia at 2015-09-02
| Rendered using Apache Maven Fluido Skin 1.4
-->
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
<meta charset="UTF-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
- <meta name="Date-Revision-yyyymmdd" content="20150817" />
+ <meta name="Date-Revision-yyyymmdd" content="20150902" />
<meta http-equiv="Content-Language" content="en" />
- <title>Apache SINGA – Communication</title>
+ <title>Apache SINGA – </title>
<link rel="stylesheet" href="../css/apache-maven-fluido-1.4.min.css" />
<link rel="stylesheet" href="../css/site.css" />
<link rel="stylesheet" href="../css/print.css" media="print" />
@@ -189,7 +189,7 @@
Apache SINGA</a>
<span class="divider">/</span>
</li>
- <li class="active ">Communication</li>
+ <li class="active "></li>
@@ -423,14 +423,15 @@
<div id="bodyColumn" class="span10" >
- <div class="section">
-<h2><a name="Communication"></a>Communication</h2>
-<hr />
+ <p>— layout: post title: Communication category : docs</p>
+<div class="section">
+<h2><a name="tags_:_rnn_example"></a>tags : [rnn, example]</h2>
+<p>{% include JB/setup %}</p>
<p>Different messaging libraries has different benefits and drawbacks. For
instance, MPI provides fast message passing between GPUs (using GPUDirect), but
does not support fault-tolerance well. On the contrary, systems using ZeroMQ
can be fault-tolerant, but does not support GPUDirect. The AllReduce function
of MPI is also missing in ZeroMQ which is efficient for data aggregation for
distributed training. In Singa, we provide general messaging APIs for
communication between threads within a process and across processes, and let
users choose the underlying implementation (MPI or ZeroMQ) that meets their
requirements.</p>
<p>Singa’s messaging library consists of two components, namely the
message, and the socket to send and receive messages. <b>Socket</b> refers to a
Singa defined data structure instead of the Linux Socket. We will introduce the
two components in detail with the following figure as an example
architecture.</p>
<p><img src="../images/arch/arch2.png" style="width: 550px" alt="" /> <img
src="../images/arch/comm.png" style="width: 550px" alt="" />
<p><b> Fig.1 - Example physical architecture and network connection</b></p>
-<p>Fig.1 shows an example physical architecture and its network connection. <a
href="architecture.html}">Section-partition server side ParamShard</a> has a
detailed description of the architecture. Each process consists of one main
thread running the stub and multiple background threads running the worker and
server tasks. The stub of the main thread forwards messages among threads . The
worker and server tasks are performed by the background threads.</p>
+<p>Fig.1 shows an example physical architecture and its network connection. <a
class="externalLink"
href="http://singa.incubator.apache.org/docs/architecture.html}">Section-partition
server side ParamShard</a> has a detailed description of the architecture.
Each process consists of one main thread running the stub and multiple
background threads running the worker and server tasks. The stub of the main
thread forwards messages among threads . The worker and server tasks are
performed by the background threads.</p>
<div class="section">
<h3><a name="Message"></a>Message</h3>
<p><object type="image/svg+xml" style="width: 100px" data="../images/msg.svg">
Not supported </object>
@@ -799,79 +800,50 @@ class SafeQueue{
</pre></div></div>
<p>For inter-process communication, we serialize the message and call
MPI’s send/receive functions to transfer them. All inter-process
connections are setup by MPI at the beginning. Consequently, the Connect and
Bind functions do nothing for both inter-process and intra-process
communication.</p>
<p>MPI’s AllReduce function is efficient for data aggregation in
distributed training. For example, <a class="externalLink"
href="http://arxiv.org/abs/1501.02876">DeepImage of Baidu</a> uses AllReduce to
aggregate the updates of parameter from all workers. It has similar
architecture as <a href="architecture.html">Fig.2</a>, where every process has
a server group and is connected with all other processes. Hence, we can
implement DeepImage in Singa by simply using MPI’s AllReduce function
for inter-process communication.</p>
-<!-- #### Server socket
+<p>{% comment %}</p></div>
+<div class="section">
+<h4><a name="Server_socket"></a>Server socket</h4>
+<p>Each server has a DEALER socket to communicate with the stub in the main
thread via an <i>in-proc</i> socket. It receives requests issued from workers
and other servers, and forwarded by the ROUTER of the stub. Since the requests
are forwarded by the stub, we can make the location of workers transparent to
server threads. The stub records the locations of workers and servers.</p>
+<p>As explained previously in the [APIs](<a class="externalLink"
href="http://singa.incubator.apache.org{%">http://singa.incubator.apache.org{%</a>
post_url /docs/2015-03-20-parameter-management %}) for parameter management,
some requests may not be processed immediately but have to be re-queued. For
instance, the Get request cannot be processed if the requested parameter is not
available, i.e., the parameter has not been put into the server’s
ParamShard. The re-queueing operation is implemented sendings the messages to
the ROUTER socket of the stub which treats the message as a newly arrived
request and queues it for processing.</p></div>
+<div class="section">
+<h4><a name="Worker_socket"></a>Worker socket</h4>
+<p>Each worker thread has a DEALER socket to communicate with the stub in the
main thread via an <i>in-proc</i> socket. It sends (Get/Update) requests to the
ROUTER in the stub which forwards the request to (local or remote) processes.
In case of the partition of ParamShard of worker side, it may also transfer
data with other workers via the DEALER socket. Again, the location of the other
side (a server or worker) of the communication is transparent to the worker.
The stub handles the addressing.</p>
+<p>PMClient executes the training logic, during which it generates GET and
UPDATE requests. A request received at the worker’s main thread contains
ID of the PMClient instance. The worker determines which server to send the
request based on its content, then sends it via the corresponding socket.
Response messages received from any of the server socket are forwarded to the
in-proc ROUTER socket. Since each response header contains the PMClient ID, it
is routed to the correct instance.</p></div>
+<div class="section">
+<h4><a name="Stub_sockets"></a>Stub sockets</h4>
+<div class="section">
+<h5><a name="ROUTER_socket"></a>ROUTER socket</h5>
+<p>The main thread has a ROUTER socket to communicate with background
threads.</p>
+<p>It forwards the requests from workers to background servers. There can be
multiple servers.If all servers maintain the same (sub) ParamShard, then the
request can be forwarded to any of them. Load-balance (like round-robin) can be
implemented in the stub to improve the performance. If each server maintains a
sub-set of the local ParamShard, then the stub forwards each request to the
corresponding server. It also forwards the synchronization requests from remote
servers to local servers in the same way.</p>
+<p>In the case of neural network partition (i.e., model partition), neighbor
layers would transfer data with each other. Hence, the ROUTER would forwards
data transfer requests from one worker to other worker. The stub looks up the
location table to decide where to forward each request.</p></div>
+<div class="section">
+<h5><a name="DEALER_sockets"></a>DEALER sockets</h5>
+<p>The main thread has multiple DEALER sockets to communicate with other
processes, one socket per process. Two processes are connected if one of the
following cases exists:</p>
+
+<ul>
+
+<li>one worker group spans across the two processes;</li>
+
+<li>two connected server groups are separated in the two processes;</li>
+
+<li>workers and the subscribed servers are separated in the two processes.</li>
+</ul>
+<p>All messages in SINGA are of multi-frame ZeroMQ format. The figure above
demonstrates different types of messages exchanged in the system.</p>
-Each server has a DEALER socket to communicate with the stub in the main
-thread via an _in-proc_ socket. It receives requests issued from workers and
-other servers, and forwarded by the ROUTER of the stub. Since the requests are
forwarded by the
-stub, we can make the location of workers transparent to server threads. The
-stub records the locations of workers and servers.
-
-As explained previously in the
-[APIs]({{ BASE_PATH }}{% post_url /docs/2015-03-20-parameter-management %})
-for parameter management, some requests may
-not be processed immediately but have to be re-queued. For instance, the Get
-request cannot be processed if the requested parameter is not available, i.e.,
-the parameter has not been put into the server's ParamShard. The re-queueing
-operation is implemented sendings the messages to the ROUTER
-socket of the stub which treats the message as a newly arrived request
-and queues it for processing.
-
-#### Worker socket
-
-Each worker thread has a DEALER socket to communicate with the stub in the main
-thread via an _in-proc_ socket. It sends (Get/Update) requests to the ROUTER in
-the stub which forwards the request to (local or remote) processes. In case of
-the partition of ParamShard of worker side, it may also transfer data with
other
-workers via the DEALER socket. Again, the location of the other side (a server
-or worker) of the communication is transparent to the worker. The stub handles
-the addressing.
-
-PMClient executes the training logic, during which it generates GET and UPDATE
-requests. A request received at the worker's main thread contains ID of the
-PMClient instance. The worker determines which server to send the request based
-on its content, then sends it via the corresponding socket. Response messages
-received from any of the server socket are forwarded to the in-proc ROUTER
-socket. Since each response header contains the PMClient ID, it is routed to
-the correct instance.
-
-#### Stub sockets
-
-##### ROUTER socket
-The main thread has a ROUTER socket to communicate with background threads.
-
-It forwards the requests from workers to background servers. There can be
-multiple servers.If all servers maintain the same (sub) ParamShard, then the
-request can be forwarded to any of them. Load-balance (like round-robin) can be
-implemented in the stub to improve the performance. If each server maintains a
-sub-set of the local ParamShard, then the stub forwards each request to the
-corresponding server. It also forwards the synchronization requests from
-remote servers to local servers in the same way.
-
-In the case of neural network partition (i.e., model partition), neighbor
-layers would transfer data with each other. Hence, the ROUTER would forwards
-data transfer requests from one worker to other worker. The stub looks up the
-location table to decide where to forward each request.
-
-##### DEALER sockets
-
-The main thread has multiple DEALER sockets to communicate with other
-processes, one socket per process. Two processes are connected if one of the
-following cases exists:
-
- * one worker group spans across the two processes;
- * two connected server groups are separated in the two processes;
- * workers and the subscribed servers are separated in the two processes.
-
-
-All messages in SINGA are of multi-frame ZeroMQ format. The figure above
demonstrates different types of messages exchanged in the system.
-
- 1. Requests generated by PMClient consist of the parameter content (which
could be empty), followed by the parameter ID (key) and the request type
(GET/PUT/REQUEST). Responses received by PMClient are also of this format.
- 2. Messages received by the worker's main thread from PMClient instances
contain another frame identifying the PMClient connection (or PMClient ID).
- 3. Requests originating form a worker and arriving at the server contain
another frame identifying the worker's connection (or Worker ID).
- 4. Requests originating from another server and arriving at the server have
the same format as (3), but the first frame identifies the server connection
(or Server ID).
- 5. After a PMServer processes a request, it generates a message with the
format similar to (3) but with extra frame indicating if the message is to be
routed back to a worker (a response message) or to route to another server (a
SYNC request).
- 6. When a request is re-queued, the PMServer generates a message and sends
it directly to the server's front-end socket. The re-queued request seen by the
server's main thread consists of all the frames in (3), followed by a REQUEUED
frame, and finally by another frame generated by the ROUTER socket identifying
connection from the PMServer instance. The main thread then strips off these
additional two frames before forwarding it to another PMServer instance like
another ordinary request. --></div></div></div>
+<ol style="list-style-type: decimal">
+
+<li>Requests generated by PMClient consist of the parameter content (which
could be empty), followed by the parameter ID (key) and the request type
(GET/PUT/REQUEST). Responses received by PMClient are also of this format.</li>
+
+<li>Messages received by the worker’s main thread from PMClient
instances contain another frame identifying the PMClient connection (or
PMClient ID).</li>
+
+<li>Requests originating form a worker and arriving at the server contain
another frame identifying the worker’s connection (or Worker ID).</li>
+
+<li>Requests originating from another server and arriving at the server have
the same format as (3), but the first frame identifies the server connection
(or Server ID).</li>
+
+<li>After a PMServer processes a request, it generates a message with the
format similar to (3) but with extra frame indicating if the message is to be
routed back to a worker (a response message) or to route to another server (a
SYNC request).</li>
+
+<li>When a request is re-queued, the PMServer generates a message and sends it
directly to the server’s front-end socket. The re-queued request seen by
the server’s main thread consists of all the frames in (3), followed by
a REQUEUED frame, and finally by another frame generated by the ROUTER socket
identifying connection from the PMServer instance. The main thread then strips
off these additional two frames before forwarding it to another PMServer
instance like another ordinary request. {% endcomment %}</li>
+</ol></div></div></div></div>
</div>
</div>
</div>
Added: websites/staging/singa/trunk/content/docs/data.html
==============================================================================
(empty)