Author: buildbot
Date: Thu Jul 23 05:53:30 2015
New Revision: 959251
Log:
Staging update by buildbot for singa
Modified:
websites/staging/singa/trunk/content/ (props changed)
websites/staging/singa/trunk/content/quick-start.html
Propchange: websites/staging/singa/trunk/content/
------------------------------------------------------------------------------
--- cms:source-revision (original)
+++ cms:source-revision Thu Jul 23 05:53:30 2015
@@ -1 +1 @@
-1692347
+1692349
Modified: websites/staging/singa/trunk/content/quick-start.html
==============================================================================
--- websites/staging/singa/trunk/content/quick-start.html (original)
+++ websites/staging/singa/trunk/content/quick-start.html Thu Jul 23 05:53:30
2015
@@ -497,18 +497,18 @@ training data shard, test data shard and
cd ../..
./bin/singa-run.sh -workspace=examples/cifar10
</pre></div></div>
-<p>Note: we have changed the command line arguments from <tt>-cluster...
-model=...</tt> to <tt>-workspace</tt>. The <tt>workspace</tt> folder must have
a job.conf file which specifies the cluster (number of workers, number of
servers, etc) and model configuration.</p>
+<p>Note: we have changed the command line arguments from <tt>-cluster..
-model..</tt> to <tt>-workspace</tt>. The <tt>workspace</tt> folder must have a
job.conf file which specifies the cluster (number of workers, number of
servers, etc) and model configuration.</p>
<p>Some training information will be shown on the screen like:</p>
<div class="source">
<div class="source"><pre class="prettyprint">Starting zookeeper ... already
running as process 21660.
-Generate host list to
/home/singa/wangwei/incubator-singa/examples/cifar10/job.hosts
-Generate job id to /home/singa/wangwei/incubator-singa/examples/cifar10/job.id
[job_id = 1]
-Executing : ./singa
-workspace=/home/singa/wangwei/incubator-singa/examples/cifar10 -job=1
+Generate host list to SINGA_ROOT/examples/cifar10/job.hosts
+Generate job id to SINGA_ROOT/examples/cifar10/job.id [job_id = 1]
+Executing : ./singa -workspace=SINGA_ROOT/examples/cifar10 -job=1
proc #0 -> 10.10.10.14:49152 (pid = 26724)
Server (group = 0, id = 0) start
Worker (group = 0, id = 0) start
-Generate pid list to
/home/singa/wangwei/incubator-singa/examples/cifar10/job.pids
+Generate pid list to SINGA_ROOT/examples/cifar10/job.pids
Test step-0, loss : 2.302607, accuracy : 0.090100
Train step-0, loss : 2.302614, accuracy : 0.062500
Train step-30, loss : 2.302403, accuracy : 0.141129
@@ -524,6 +524,12 @@ Test step-300, loss : 2.256824, accuracy
Train step-300, loss : 2.292490, accuracy : 0.165282
</pre></div></div>
<p>You can find more logs under the <tt>/tmp</tt> folder. Once the training is
finished the learned model parameters will be dumped into $workspace/checkpoint
folder. The dumped file can be used for continuing the training or as
initialization for other similar models. <a href="checkpoint.html">Checkpoint
and Resume</a> discusses more details.</p>
+<p>The job can be stopped by</p>
+
+<div class="source">
+<div class="source"><pre class="prettyprint">./bin/singa-stop.sh
+</pre></div></div>
+<p>It will kill all singa processes.</p>
<!-- -
To train the model without any partitioning, you just set the numbers
in the cluster configuration file (*cluster.conf*) as :
@@ -561,15 +567,15 @@ cluster {
<div class="source">
<div class="source"><pre class="prettyprint">// hostfile
-logbase-a04
-logbase-a05
-logbase-a06
+singa-node1
+singa-node2
+singa-node3
...
</pre></div></div></li>
<li>
<p>The zookeeper location must be configured in conf/singa.conf, e.g.,</p>
-<p>zookeeper_host: “logbase-a04:2181”</p></li>
+<p>zookeeper_host: “singa-node1:2181”</p></li>
<li>
<p>Make your ssh command password-free</p></li>
@@ -580,28 +586,19 @@ logbase-a06
<div class="source"><pre class="prettyprint">./bin/singa-run.sh
-workspace=examples/cifar10
</pre></div></div>
<p>The <tt>singa-run.sh</tt> will calculate the number of nodes (i.e.,
processes) to launch and will generate a job.hosts file under workspace by
looping through all nodes in conf/hostfile. Hence if there are few nodes in the
hostfile, then multiple processes would be launched in one node.</p>
-<p>You can get some job information like job ID and running processes using
the singa-console.sh script:</p>
-
-<div class="source">
-<div class="source"><pre class="prettyprint">./bin/singa-console.sh list
-JOB ID |NUM PROCS
-----------|-----------
-job-4 |2
-</pre></div></div>
<p>Sample training output is</p>
<div class="source">
-<div class="source"><pre class="prettyprint">Generate job id to
/home/singa/wangwei/incubator-singa/examples/cifar10/job.id [job_id = 4]
-Executing @ logbase-a04 : cd /home/singa/wangwei/incubator-singa; ./singa
-workspace=/home/singa/wangwei/incubator-singa/examples/cifar10 -job=4
-Executing @ logbase-a05 : cd /home/singa/wangwei/incubator-singa; ./singa
-workspace=/home/singa/wangwei/incubator-singa/examples/cifar10 -job=4
+<div class="source"><pre class="prettyprint">Generate job id to
SINGA_ROOT/examples/cifar10/job.id [job_id = 4]
+Executing @ singa-node1: cd SINGA_ROOT; ./singa
-workspace=SINGA_ROOT/examples/cifar10 -job=4
+Executing @ singa-node2: cd SINGA_ROOT; ./singa
-workspace=SINGA_ROOT/examples/cifar10 -job=4
proc #0 -> 10.10.10.15:49152 (pid = 3504)
proc #1 -> 10.10.10.14:49152 (pid = 27119)
Server (group = 0, id = 1) start
Worker (group = 1, id = 0) start
Server (group = 0, id = 0) start
Worker (group = 0, id = 0) start
-Generate pid list to
-/home/singa/wangwei/incubator-singa/examples/cifar10/job.pids
+Generate pid list to SINGA_ROOT/examples/cifar10/job.pids
Test step-0, loss : 2.297355, accuracy : 0.101700
Train step-0, loss : 2.274724, accuracy : 0.062500
Train step-30, loss : 2.263850, accuracy : 0.131048
@@ -617,16 +614,18 @@ Test step-300, loss : 1.921962, accuracy
Train step-300, loss : 2.129271, accuracy : 0.208056
</pre></div></div>
<p>We can see that the accuracy (resp. loss) from distributed training
increases (resp. decreases) faster than that for the single node training.</p>
-<p>You can stop the training by singa-stop.sh</p>
+<p>You can get some job information like job ID and running processes using
the singa-console.sh script:</p>
<div class="source">
-<div class="source"><pre class="prettyprint">./bin/singa-stop.sh
-Kill singa @ logbase-a04 ...
-Kill singa @ logbase-a05 ...
-bash: line 1: 27119 Killed ./singa
-workspace=/home/singa/wangwei/incubator-singa/examples/cifar10 -job=4
-Kill singa @ logbase-a06 ...
-bash: line 1: 3504 Killed ./singa
-workspace=/home/singa/wangwei/incubator-singa/examples/cifar10 -job=4
-Cleanning metadata in zookeeper ...
+<div class="source"><pre class="prettyprint">./bin/singa-console.sh list
+JOB ID |NUM PROCS
+----------|-----------
+job-4 |2
+</pre></div></div>
+<p>To kill the job, just run</p>
+
+<div class="source">
+<div class="source"><pre class="prettyprint">./bin/singa-console.sh kill job-4
</pre></div></div>
<!-- -
In other words,