svn commit: r959178 [2/2] - in /websites/staging/singa/trunk/content: ./ community/ develop/ docs/

buildbot Wed, 22 Jul 2015 08:44:20 -0700

Modified: websites/staging/singa/trunk/content/introduction.html
==============================================================================
--- websites/staging/singa/trunk/content/introduction.html (original)
+++ websites/staging/singa/trunk/content/introduction.html Wed Jul 22 15:43:23 
2015
@@ -1,13 +1,13 @@
 <!DOCTYPE html>
 <!--
- | Generated by Apache Maven Doxia at 2015-07-20 
+ | Generated by Apache Maven Doxia at 2015-07-22 
  | Rendered using Apache Maven Fluido Skin 1.4
 -->
 <html xmlns="http://www.w3.org/1999/xhtml"; xml:lang="en" lang="en">
   <head>
     <meta charset="UTF-8" />
     <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-    <meta name="Date-Revision-yyyymmdd" content="20150720" />
+    <meta name="Date-Revision-yyyymmdd" content="20150722" />
     <meta http-equiv="Content-Language" content="en" />
     <title>Apache SINGA &#x2013; Introduction</title>
     <link rel="stylesheet" href="./css/apache-maven-fluido-1.4.min.css" />
@@ -71,7 +71,7 @@
 </li>
                   
                       <li class="dropdown-submenu">
-                                      <a href="docs/program-model.html"  
title="Programming Model">Programming Model</a>
+                                      <a href="docs/user-guide.html"  
title="User Guide">User Guide</a>
               <ul class="dropdown-menu">
                                   <li>      <a href="docs/model-config.html"  
title="Model Configuration">Model Configuration</a>
 </li>
@@ -241,9 +241,9 @@
                                                                                
                 
       <li>
   
-                          <a href="docs/program-model.html" title="Programming 
Model">
+                          <a href="docs/user-guide.html" title="User Guide">
           <span class="icon-chevron-down"></span>
-        Programming Model</a>
+        User Guide</a>
                     <ul class="nav nav-list">
                     
       <li>
@@ -452,7 +452,7 @@
 <h3><a name="Overview"></a>Overview</h3>
 <p>SINGA is designed to be general to implement the distributed training 
algorithms of existing systems. Distributed deep learning training is an on- 
going challenge research problem in terms of scalability. There is no 
established scalable distributed training algorithm. Different algorithms are 
used by existing systems, e.g. Hogwild used by Caffe, AllReduce used by 
Baidu&#x2019;s DeepImage, and the Downpour algorithm proposed by Google Brain 
and used at Microsoft Adam. SINGA provides users the chance to select the one 
that is most scalable for their model and data.</p>
 <p>To provide good usability, SINGA provides a simple programming model based 
on the layer structure that is common in deep learning models. Users override 
the base layer class to implement their own layer logics for feature 
transformation. A model is constructed by configuring each layer and their 
connections like Caffe. SINGA takes care of the data and model partitioning, 
and makes the underlying distributed communication (almost) transparent to 
users. A set of built-in layers and example models are provided.</p>
-<p>SINGA is an <a class="externalLink" 
href="http://singa.incubator.apache.org/";>Apache incubator project</a>, 
released under Apache License 2. It is mainly developed by the DBSystem group 
of National University of Singapore. A diverse community is being constructed 
to welcome open-source contribution. </p></div>
+<p>SINGA is an <a class="externalLink" 
href="http://singa.incubator.apache.org/";>Apache incubator project</a>, 
released under Apache License 2. It is mainly developed by the DBSystem group 
of National University of Singapore. A diverse community is being constructed 
to welcome open-source contribution.</p></div>
 <div class="section">
 <h3><a name="Goals_and_Principles"></a>Goals and Principles</h3>
 <div class="section">
@@ -488,11 +488,39 @@
 </ul>
 <p>Considering extensibility, we make our core data structures (e.g., Layer) 
and operations general enough for programmers to override.</p></div></div>
 <div class="section">
-<h3><a name="System_Architecture"></a>System Architecture</h3>
-<p><img src="images/arch.png" alt="SINGA Logical Architecture" style="width: 
500px" /> 
-<p><b>SINGA Logical Architecture</b></p>
-<p>The logical system architecture is shown in the above figure. There are two 
types of execution units, namely workers and servers. They are grouped 
according to the cluster configuration. Each worker group runs against a 
partition of the training dataset to compute the updates (e.g., the gradients) 
of parameters on one model replica, denoted as ParamShard. Worker groups run 
asynchronously, while workers within one group run synchronously with each 
worker computing (partial) updates for a subset of model parameters. Each 
server group also maintains one replica of the model parameters (i.e., 
ParamShard). It receives and handles requests (e.g., Get/Put/Update) from 
workers. Every server group synchronizes with neighboring server groups 
periodically or ac- cording to some specified rules.</p>
-<p>SINGA starts by parsing the cluster and model configurations. The first 
worker group initializes model parameters and sends Put requests to put them 
into the ParamShards of servers. Then every worker group runs the training 
algorithm by iterating over its training data in mini-batch. Each worker 
collects the fresh parameters from servers before computing the updates (e.g., 
gradients) for them. Once it finishes the computation, it issues update 
requests to the servers.</p></div></div>
+<h3><a name="Where_to_go_from_here"></a>Where to go from here</h3>
+
+<ul>
+  
+<li>
+<p>SINGA <a href="user-guide.html">User guide</a> describes how to submit a  
training job for your own deep learning model.</p></li>
+  
+<li>
+<p>SINGA <a href="architecture.html">architecture</a> illustrates how 
different training frameworks are  supported using a general system 
architecture.</p></li>
+  
+<li>
+<p><a href="examples.html">Training examples</a> are provided to help users 
get started with SINGA.</p></li>
+</ul>
+<!-- -
+### System Architecture
+
+<img src="images/arch.png" alt="SINGA Logical Architecture" style="width: 
500px"/>
+<p><strong>SINGA Logical Architecture</strong></p>
+
+The logical system architecture is shown in the above figure. There are two 
types of execution units,
+namely workers and servers. They are grouped according to the cluster 
configuration. Each worker
+group runs against a partition of the training dataset to compute the updates 
(e.g., the gradients)
+of parameters on one model replica, denoted as ParamShard. Worker groups run 
asynchronously, while
+workers within one group run synchronously with each worker computing 
(partial) updates for a subset
+of model parameters. Each server group also maintains one replica of the model 
parameters
+(i.e., ParamShard). It receives and handles requests (e.g., Get/Put/Update) 
from workers. Every server
+group synchronizes with neighboring server groups periodically or ac- cording 
to some specified rules.
+
+SINGA starts by parsing the cluster and model configurations. The first worker 
group initializes model
+parameters and sends Put requests to put them into the ParamShards of servers. 
Then every worker group
+runs the training algorithm by iterating over its training data in mini-batch. 
Each worker collects the
+fresh parameters from servers before computing the updates (e.g., gradients) 
for them. Once it finishes
+the computation, it issues update requests to the servers. --></div></div>
                   </div>
             </div>
           </div>


Modified: websites/staging/singa/trunk/content/quick-start.html
==============================================================================
--- websites/staging/singa/trunk/content/quick-start.html (original)
+++ websites/staging/singa/trunk/content/quick-start.html Wed Jul 22 15:43:23 
2015
@@ -1,13 +1,13 @@
 <!DOCTYPE html>
 <!--
- | Generated by Apache Maven Doxia at 2015-07-20 
+ | Generated by Apache Maven Doxia at 2015-07-22 
  | Rendered using Apache Maven Fluido Skin 1.4
 -->
 <html xmlns="http://www.w3.org/1999/xhtml"; xml:lang="en" lang="en">
   <head>
     <meta charset="UTF-8" />
     <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-    <meta name="Date-Revision-yyyymmdd" content="20150720" />
+    <meta name="Date-Revision-yyyymmdd" content="20150722" />
     <meta http-equiv="Content-Language" content="en" />
     <title>Apache SINGA &#x2013; Quick Start</title>
     <link rel="stylesheet" href="./css/apache-maven-fluido-1.4.min.css" />
@@ -71,7 +71,7 @@
 </li>
                   
                       <li class="dropdown-submenu">
-                                      <a href="docs/program-model.html"  
title="Programming Model">Programming Model</a>
+                                      <a href="docs/user-guide.html"  
title="User Guide">User Guide</a>
               <ul class="dropdown-menu">
                                   <li>      <a href="docs/model-config.html"  
title="Model Configuration">Model Configuration</a>
 </li>
@@ -241,9 +241,9 @@
                                                                                
                 
       <li>
   
-                          <a href="docs/program-model.html" title="Programming 
Model">
+                          <a href="docs/user-guide.html" title="User Guide">
           <span class="icon-chevron-down"></span>
-        Programming Model</a>
+        User Guide</a>
                     <ul class="nav nav-list">
                     
       <li>
@@ -471,15 +471,20 @@ git clone https://github.com/apache/incu
 <div class="source"><pre class="prettyprint">./configure
 make
 </pre></div></div>
-<p>If there are dependent libraries missing, please refer to <a 
href="docs/installation.html">installation</a> page for guidance on installing 
them.</p></div>
+<p>If there are dependent libraries missing, please refer to <a 
href="docs/installation.html">installation</a> page for guidance on installing 
them.</p>
+<!-- -
+### Run in standalone mode
+
+Running SINGA in standalone mode is on the contrary of running it on Mesos or
+YARN. For standalone mode, users have to manage the resources manually. For
+instance, they have to prepare a host file containing all running nodes.
+There is no management on CPU and memory resources, hence SINGA consumes as 
much
+CPU and memory resources as it needs. --></div>
 <div class="section">
-<h3><a name="Run_in_standalone_mode"></a>Run in standalone mode</h3>
-<p>Running SINGA in standalone mode is on the contrary of running it on Mesos 
or YARN. For standalone mode, users have to manage the resources manually. For 
instance, they have to prepare a host file containing all running nodes. There 
is no management on CPU and memory resources, hence SINGA consumes as much CPU 
and memory resources as it needs.</p>
-<div class="section">
-<h4><a name="Training_on_a_single_node"></a>Training on a single node</h4>
+<h3><a name="Training_on_a_single_node"></a>Training on a single node</h3>
 <p>For single node training, one process will be launched to run the SINGA 
code on the node where SINGA is started. We train the <a class="externalLink" 
href="http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks";>CNN
 model</a> over the <a class="externalLink" 
href="http://www.cs.toronto.edu/~kriz/cifar.html";>CIFAR-10</a> dataset as an 
example. The hyper-parameters are set following <a class="externalLink" 
href="https://code.google.com/p/cuda-convnet/";>cuda-convnet</a>.</p>
 <div class="section">
-<h5><a name="Data_and_model_preparation"></a>Data and model preparation</h5>
+<h4><a name="Data_and_model_preparation"></a>Data and model preparation</h4>
 <p>Download the dataset and create the data shards for training and 
testing.</p>
 
 <div class="source">
@@ -491,82 +496,201 @@ make create
 <p>A training dataset and a test dataset are created under <i>train-shard</i> 
and <i>test-shard</i> folder respectively. A image_mean.bin file is also 
generated, which contains the feature mean of all images. <!-- After creating 
the data shards, you  to update the paths in the
 model configuration file (*model.conf*) for the
 training data shard, test data shard and the mean file. --></p>
-<p>Since all modules used for training this CNN model are provided by SINGA as 
built-in modules, there is no need to write any code. Instead, you just 
executable the running script (<i>../../bin/singa-run.sh</i>) by providing the 
model configuration file (<i>model.conf</i>). If you want to implement your own 
modules, e.g., layer, then you have to register your modules in the driver 
code. After compiling the driver code, link it with the SINGA library to 
generate the executable. More details are described in <a href="">Code your own 
models</a>.</p></div>
-<div class="section">
-<h5><a name="Training_without_partitioning"></a>Training without 
partitioning</h5>
-<p>To train the model without any partitioning, you just set the numbers in 
the cluster configuration file (<i>cluster.conf</i>) as :</p>
-
-<div class="source">
-<div class="source"><pre class="prettyprint">nworker_groups: 1
-nworkers_per_group: 1
-nserver_groups: 1
-nservers_per_group: 1
-</pre></div></div>
-<p>One worker group trains against one partition of the training dataset. If 
<i>nworker_groups</i> is set to 1, then there is no data partitioning. One 
worker runs over a partition of the model. If <i>nworkers_per_group</i> is set 
to 1, then there is no model partitioning. More details on the cluster 
configuration are described in the <a href="docs/architecture.html">System 
Architecture</a> page.</p>
+<p>Since all modules used for training this CNN model are provided by SINGA as 
built-in modules, there is no need to write any code. You just execute the 
script (<i>../../bin/singa-run.sh</i>) by providing the workspace which 
includes the job configuration file (<i>job.conf</i>). If you want to implement 
your own modules, e.g., layer, then you have to register your modules in the <a 
href="user-guide.html">driver program</a>.</p>
 <p>Start the training by running:</p>
 
 <div class="source">
 <div class="source"><pre class="prettyprint">#goto top level folder
 cd ../..
-./bin/singa-run.sh -model=examples/cifar10/model.conf 
-cluster=examples/cifar10/cluster.conf
-</pre></div></div></div>
-<div class="section">
-<h5><a name="Training_with_data_Partitioning"></a>Training with data 
Partitioning</h5>
+./bin/singa-run.sh -workspace=examples/cifar10
+</pre></div></div>
+<p>Note: we have changed the command line arguments from <tt>-cluster... 
-model=...</tt> to <tt>-workspace</tt>. The <tt>workspace</tt> folder must have 
a job.conf file which specifies the cluster (number of workers, number of 
servers, etc) and model configuration.</p>
+<p>Some training information will be shown on the screen like:</p>
 
 <div class="source">
-<div class="source"><pre class="prettyprint">nworker_groups: 2
-nserver_groups: 1
-nservers_per_group: 1
-nworkers_per_group: 1
-nworkers_per_procs: 2
-workspace: &quot;examples/cifar10/&quot;
+<div class="source"><pre class="prettyprint">Starting zookeeper ... already 
running as process 21660.
+Generate host list to 
/home/singa/wangwei/incubator-singa/examples/cifar10/job.hosts
+Generate job id to /home/singa/wangwei/incubator-singa/examples/cifar10/job.id 
[job_id = 1]
+Executing : ./singa 
-workspace=/home/singa/wangwei/incubator-singa/examples/cifar10 -job=1
+proc #0 -&gt; 10.10.10.14:49152 (pid = 26724)
+Server (group = 0, id = 0) start
+Worker (group = 0, id = 0) start
+Generate pid list to 
/home/singa/wangwei/incubator-singa/examples/cifar10/job.pids
+Test step-0, loss : 2.302607, accuracy : 0.090100
+Train step-0, loss : 2.302614, accuracy : 0.062500
+Train step-30, loss : 2.302403, accuracy : 0.141129
+Train step-60, loss : 2.301960, accuracy : 0.155738
+Train step-90, loss : 2.301470, accuracy : 0.159341
+Train step-120, loss : 2.301048, accuracy : 0.160640
+Train step-150, loss : 2.300414, accuracy : 0.161424
+Train step-180, loss : 2.299842, accuracy : 0.160912
+Train step-210, loss : 2.298510, accuracy : 0.163211
+Train step-240, loss : 2.297058, accuracy : 0.163641
+Train step-270, loss : 2.295308, accuracy : 0.163745
+Test step-300, loss : 2.256824, accuracy : 0.193500
+Train step-300, loss : 2.292490, accuracy : 0.165282
 </pre></div></div>
-<p>The above cluster configuration file specifies two worker groups and one 
server group. Worker groups run asynchronously but share the memory space for 
parameter values. In other words, it runs as the Hogwild algorithm. Since it is 
running in a single node, we can avoid partitioning the dataset explicitly. In 
specific, a random start offset is assigned to each worker group such that they 
would not work on the same mini-batch for every iteration. Consequently, they 
run like on different data partitions. The running command is the same:</p>
+<p>You can find more logs under the <tt>/tmp</tt> folder. Once the training is 
finished the learned model parameters will be dumped into $workspace/checkpoint 
folder. The dumped file can be used for continuing the training or as 
initialization for other similar models. <a href="checkpoint.html">Checkpoint 
and Resume</a> discusses more details.</p>
+<!-- -
+To train the model without any partitioning, you just set the numbers
+in the cluster configuration file (*cluster.conf*) as :
+
+    nworker_groups: 1
+    nworkers_per_group: 1
+    nserver_groups: 1
+    nservers_per_group: 1
+
+One worker group trains against one partition of the training dataset. If
+*nworker_groups* is set to 1, then there is no data partitioning. One worker
+runs over a partition of the model. If *nworkers_per_group* is set to 1, then
+there is no model partitioning. More details on the cluster configuration are
+described in the [System Architecture](docs/architecture.html) page. --></div>
+<div class="section">
+<h4><a name="Distributed_Training"></a>Distributed Training</h4>
+<p>To train the model in distributed environment, we first change the job 
configuration to use 2 worker groups (one worker per group) and 2 servers (from 
the same server group).</p>
+
+<div class="source">
+<div class="source"><pre class="prettyprint">// job.conf
+cluster {
+  nworker_groups: 2
+  nserver_groups: 1
+  nservers_per_group: 2
+}
+</pre></div></div>
+<p>This configuration would run SINGA using Downpour training framework. In 
specific, the 2 worker groups run asynchronously to compute the parameter 
gradients. Each server maintains a subset of parameters, i.e., updating the 
parameters based on gradients passed by workers.</p>
+<p>To run SINGA in a cluster,</p>
 
+<ol style="list-style-type: decimal">
+  
+<li>
+<p>A hostfile should be prepared under conf/ folder, e.g.,</p>
+  
 <div class="source">
-<div class="source"><pre class="prettyprint">./bin/singa-run.sh 
-model=examples/cifar10/model.conf -cluster=examples/cifar10/cluster.conf
-</pre></div></div></div>
-<div class="section">
-<h5><a name="Training_with_model_Partitioning"></a>Training with model 
Partitioning</h5>
+<div class="source"><pre class="prettyprint">// hostfile
+logbase-a04
+logbase-a05
+logbase-a06
+...
+</pre></div></div></li>
+  
+<li>
+<p>The zookeeper location must be configured in conf/singa.conf, e.g.,</p>
+<p>zookeeper_host: &#x201c;logbase-a04:2181&#x201d;</p></li>
+  
+<li>
+<p>Make your ssh command password-free</p></li>
+</ol>
+<p>Currently, we assume the data files are on NFS, i.e., visible to all nodes. 
To start the training, run</p>
 
 <div class="source">
-<div class="source"><pre class="prettyprint">nworker_groups: 1
-nserver_groups: 1
-nservers_per_group: 1
-nworkers_per_group: 2
-nworkers_per_procs: 2
-workspace: &quot;examples/cifar10/&quot;
+<div class="source"><pre class="prettyprint">./bin/singa-run.sh 
-workspace=examples/cifar10
 </pre></div></div>
-<p>The above cluster configuration specifies one worker group with two 
workers. The workers run synchronously, i.e., they are synchronized after one 
iteration. The model is partitioned among the two workers. In specific, each 
layer is sliced such that every worker is assigned one sliced layer. The sliced 
layer is the same as the original layer except that it only has B/g feature 
instances, where B is the size of instances in a mini-batch, g is the number of 
workers in a group. </p>
-<p>All other settings are the same as running without partitioning</p>
+<p>The <tt>singa-run.sh</tt> will calculate the number of nodes (i.e., 
processes) to launch and will generate a job.hosts file under workspace by 
looping all nodes in conf/hostfile. Hence if there are few nodes in the 
hostfile, then multiple processes would be launched in one node.</p>
+<p>You can get some job information like job ID and running processes using 
the singa-console.sh script:</p>
 
 <div class="source">
-<div class="source"><pre class="prettyprint">./bin/singa-run.sh 
-model=examples/cifar10/model.conf -cluster=examples/cifar10/cluster.conf
-</pre></div></div></div></div>
-<div class="section">
-<h4><a name="Training_in_a_cluster"></a>Training in a cluster</h4>
-<p>To run the distributed Hogwild framework, configure the cluster.conf as:</p>
-
-<div class="source">
-<div class="source"><pre class="prettyprint">nworker_groups: 2
-nserver_groups: 2
+<div class="source"><pre class="prettyprint">./bin/singa-console.sh list
+JOB ID    |NUM PROCS
+----------|-----------
+job-4     |2
 </pre></div></div>
-<p>and start one process as,</p>
+<p>Sample training output is</p>
 
 <div class="source">
-<div class="source"><pre class="prettyprint">./bin/singa-run.sh 
-model=examples/cifar10/model.conf -cluster=examples/cifar10/cluster.conf
+<div class="source"><pre class="prettyprint">Generate job id to 
/home/singa/wangwei/incubator-singa/examples/cifar10/job.id [job_id = 4]
+Executing @ logbase-a04 : cd /home/singa/wangwei/incubator-singa; ./singa 
-workspace=/home/singa/wangwei/incubator-singa/examples/cifar10 -job=4
+Executing @ logbase-a05 : cd /home/singa/wangwei/incubator-singa; ./singa 
-workspace=/home/singa/wangwei/incubator-singa/examples/cifar10 -job=4
+proc #0 -&gt; 10.10.10.15:49152 (pid = 3504)
+proc #1 -&gt; 10.10.10.14:49152 (pid = 27119)
+Server (group = 0, id = 1) start
+Worker (group = 1, id = 0) start
+Server (group = 0, id = 0) start
+Worker (group = 0, id = 0) start
+Generate pid list to
+/home/singa/wangwei/incubator-singa/examples/cifar10/job.pids
+Test step-0, loss : 2.297355, accuracy : 0.101700
+Train step-0, loss : 2.274724, accuracy : 0.062500
+Train step-30, loss : 2.263850, accuracy : 0.131048
+Train step-60, loss : 2.249972, accuracy : 0.133197
+Train step-90, loss : 2.235008, accuracy : 0.151786
+Train step-120, loss : 2.228674, accuracy : 0.154959
+Train step-150, loss : 2.215979, accuracy : 0.165149
+Train step-180, loss : 2.198111, accuracy : 0.180249
+Train step-210, loss : 2.175717, accuracy : 0.188389
+Train step-240, loss : 2.160980, accuracy : 0.197095
+Train step-270, loss : 2.145763, accuracy : 0.202030
+Test step-300, loss : 1.921962, accuracy : 0.299100
+Train step-300, loss : 2.129271, accuracy : 0.208056
 </pre></div></div>
-<p>and then start another process as,</p>
+<p>We can see that the accuracy (resp. loss) distributed training increases 
(resp. decreases) faster than single node training.</p>
+<p>You can stop the training by singa-stop.sh</p>
 
 <div class="source">
-<div class="source"><pre class="prettyprint">./singa 
-model=examples/cifar10/model.conf -cluster=examples/cifar10/cluster.conf
+<div class="source"><pre class="prettyprint">./bin/singa-stop.sh
+Kill singa @ logbase-a04 ...
+Kill singa @ logbase-a05 ...
+bash: line 1: 27119 Killed                  ./singa 
-workspace=/home/singa/wangwei/incubator-singa/examples/cifar10 -job=4
+Kill singa @ logbase-a06 ...
+bash: line 1:  3504 Killed                  ./singa 
-workspace=/home/singa/wangwei/incubator-singa/examples/cifar10 -job=4
+Cleanning metadata in zookeeper ...
 </pre></div></div>
-<p>Note that the two commands are different! The first one will start the 
zookeeper. Currently we assume that the example/cifar10 folder is in NFS. 
</p></div></div>
-<div class="section">
-<h3><a name="Run_with_Mesos"></a>Run with Mesos</h3>
-<p><i>in working</i>&#x2026;</p></div>
-<div class="section">
-<h3><a name="Run_with_YARN"></a>Run with YARN</h3></div></div>
+<!-- -
+In other words,
+it runs as the Hogwild algorithm. Since it is running in a single node, we can 
avoid partitioning the
+dataset explicitly. In specific, a random start offset is assigned to each 
worker group such that they
+would not work on the same mini-batch for every iteration. Consequently, they 
run like on different data
+partitions.
+The running command is the same:
+
+    ./bin/singa-run.sh -model=examples/cifar10/model.conf 
-cluster=examples/cifar10/cluster.conf
+
+
+##### Training with model Partitioning
+
+    nworker_groups: 1
+    nserver_groups: 1
+    nservers_per_group: 1
+    nworkers_per_group: 2
+    nworkers_per_procs: 2
+    workspace: "examples/cifar10/"
+
+The above cluster configuration specifies one worker group with two workers.
+The workers run synchronously, i.e., they are synchronized after one iteration.
+The model is partitioned among the two workers. In specific, each layer is
+sliced such that every worker is assigned one sliced layer. The sliced layer is
+the same as the original layer except that it only has B/g feature instances,
+where B is the size of instances in a mini-batch, g is the number of workers in
+a group.
+
+All other settings are the same as running without partitioning
+
+    ./bin/singa-run.sh -model=examples/cifar10/model.conf 
-cluster=examples/cifar10/cluster.conf
+
+
+
+#### Training in a cluster
+
+To run the distributed Hogwild framework, configure the cluster.conf as:
+
+    nworker_groups: 2
+    nserver_groups: 2
+
+and start one process as,
+
+    ./bin/singa-run.sh -model=examples/cifar10/model.conf 
-cluster=examples/cifar10/cluster.conf
+
+and then start another process as,
+
+    ./singa -model=examples/cifar10/model.conf 
-cluster=examples/cifar10/cluster.conf
+
+Note that the two commands are different! The first one will start the 
zookeeper. Currently we assume
+that the example/cifar10 folder is in NFS.
+
+### Run with Mesos
+
+*in working*...
+
+### Run with YARN --></div></div></div>
                   </div>
             </div>
           </div>

svn commit: r959178 [2/2] - in /websites/staging/singa/trunk/content: ./ community/ develop/ docs/

Reply via email to