quick-start.html

buildbot Wed, 22 Jul 2015 22:55:49 -0700

Author: buildbot
Date: Thu Jul 23 05:53:30 2015
New Revision: 959251

Log:
Staging update by buildbot for singa


Modified:
    websites/staging/singa/trunk/content/   (props changed)
    websites/staging/singa/trunk/content/quick-start.html

Propchange: websites/staging/singa/trunk/content/
------------------------------------------------------------------------------
--- cms:source-revision (original)
+++ cms:source-revision Thu Jul 23 05:53:30 2015
@@ -1 +1 @@
-1692347
+1692349

Modified: websites/staging/singa/trunk/content/quick-start.html
==============================================================================
--- websites/staging/singa/trunk/content/quick-start.html (original)
+++ websites/staging/singa/trunk/content/quick-start.html Thu Jul 23 05:53:30 
2015
@@ -497,18 +497,18 @@ training data shard, test data shard and
 cd ../..
 ./bin/singa-run.sh -workspace=examples/cifar10
 </pre></div></div>
-<p>Note: we have changed the command line arguments from <tt>-cluster... 
-model=...</tt> to <tt>-workspace</tt>. The <tt>workspace</tt> folder must have 
a job.conf file which specifies the cluster (number of workers, number of 
servers, etc) and model configuration.</p>
+<p>Note: we have changed the command line arguments from <tt>-cluster.. 
-model..</tt> to <tt>-workspace</tt>. The <tt>workspace</tt> folder must have a 
job.conf file which specifies the cluster (number of workers, number of 
servers, etc) and model configuration.</p>
 <p>Some training information will be shown on the screen like:</p>
 
 <div class="source">
 <div class="source"><pre class="prettyprint">Starting zookeeper ... already 
running as process 21660.
-Generate host list to 
/home/singa/wangwei/incubator-singa/examples/cifar10/job.hosts
-Generate job id to /home/singa/wangwei/incubator-singa/examples/cifar10/job.id 
[job_id = 1]
-Executing : ./singa 
-workspace=/home/singa/wangwei/incubator-singa/examples/cifar10 -job=1
+Generate host list to SINGA_ROOT/examples/cifar10/job.hosts
+Generate job id to SINGA_ROOT/examples/cifar10/job.id [job_id = 1]
+Executing : ./singa -workspace=SINGA_ROOT/examples/cifar10 -job=1
 proc #0 -&gt; 10.10.10.14:49152 (pid = 26724)
 Server (group = 0, id = 0) start
 Worker (group = 0, id = 0) start
-Generate pid list to 
/home/singa/wangwei/incubator-singa/examples/cifar10/job.pids
+Generate pid list to SINGA_ROOT/examples/cifar10/job.pids
 Test step-0, loss : 2.302607, accuracy : 0.090100
 Train step-0, loss : 2.302614, accuracy : 0.062500
 Train step-30, loss : 2.302403, accuracy : 0.141129
@@ -524,6 +524,12 @@ Test step-300, loss : 2.256824, accuracy
 Train step-300, loss : 2.292490, accuracy : 0.165282
 </pre></div></div>
 <p>You can find more logs under the <tt>/tmp</tt> folder. Once the training is 
finished the learned model parameters will be dumped into $workspace/checkpoint 
folder. The dumped file can be used for continuing the training or as 
initialization for other similar models. <a href="checkpoint.html">Checkpoint 
and Resume</a> discusses more details.</p>
+<p>The job can be stopped by</p>
+
+<div class="source">
+<div class="source"><pre class="prettyprint">./bin/singa-stop.sh
+</pre></div></div>
+<p>It will kill all singa processes.</p>
 <!-- -
 To train the model without any partitioning, you just set the numbers
 in the cluster configuration file (*cluster.conf*) as :
@@ -561,15 +567,15 @@ cluster {
   
 <div class="source">
 <div class="source"><pre class="prettyprint">// hostfile
-logbase-a04
-logbase-a05
-logbase-a06
+singa-node1
+singa-node2
+singa-node3
 ...
 </pre></div></div></li>
   
 <li>
 <p>The zookeeper location must be configured in conf/singa.conf, e.g.,</p>
-<p>zookeeper_host: &#x201c;logbase-a04:2181&#x201d;</p></li>
+<p>zookeeper_host: &#x201c;singa-node1:2181&#x201d;</p></li>
   
 <li>
 <p>Make your ssh command password-free</p></li>
@@ -580,28 +586,19 @@ logbase-a06
 <div class="source"><pre class="prettyprint">./bin/singa-run.sh 
-workspace=examples/cifar10
 </pre></div></div>
 <p>The <tt>singa-run.sh</tt> will calculate the number of nodes (i.e., 
processes) to launch and will generate a job.hosts file under workspace by 
looping through all nodes in conf/hostfile. Hence if there are few nodes in the 
hostfile, then multiple processes would be launched in one node.</p>
-<p>You can get some job information like job ID and running processes using 
the singa-console.sh script:</p>
-
-<div class="source">
-<div class="source"><pre class="prettyprint">./bin/singa-console.sh list
-JOB ID    |NUM PROCS
-----------|-----------
-job-4     |2
-</pre></div></div>
 <p>Sample training output is</p>
 
 <div class="source">
-<div class="source"><pre class="prettyprint">Generate job id to 
/home/singa/wangwei/incubator-singa/examples/cifar10/job.id [job_id = 4]
-Executing @ logbase-a04 : cd /home/singa/wangwei/incubator-singa; ./singa 
-workspace=/home/singa/wangwei/incubator-singa/examples/cifar10 -job=4
-Executing @ logbase-a05 : cd /home/singa/wangwei/incubator-singa; ./singa 
-workspace=/home/singa/wangwei/incubator-singa/examples/cifar10 -job=4
+<div class="source"><pre class="prettyprint">Generate job id to 
SINGA_ROOT/examples/cifar10/job.id [job_id = 4]
+Executing @ singa-node1: cd SINGA_ROOT; ./singa 
-workspace=SINGA_ROOT/examples/cifar10 -job=4
+Executing @ singa-node2: cd SINGA_ROOT; ./singa 
-workspace=SINGA_ROOT/examples/cifar10 -job=4
 proc #0 -&gt; 10.10.10.15:49152 (pid = 3504)
 proc #1 -&gt; 10.10.10.14:49152 (pid = 27119)
 Server (group = 0, id = 1) start
 Worker (group = 1, id = 0) start
 Server (group = 0, id = 0) start
 Worker (group = 0, id = 0) start
-Generate pid list to
-/home/singa/wangwei/incubator-singa/examples/cifar10/job.pids
+Generate pid list to SINGA_ROOT/examples/cifar10/job.pids
 Test step-0, loss : 2.297355, accuracy : 0.101700
 Train step-0, loss : 2.274724, accuracy : 0.062500
 Train step-30, loss : 2.263850, accuracy : 0.131048
@@ -617,16 +614,18 @@ Test step-300, loss : 1.921962, accuracy
 Train step-300, loss : 2.129271, accuracy : 0.208056
 </pre></div></div>
 <p>We can see that the accuracy (resp. loss) from distributed training 
increases (resp. decreases) faster than that for the single node training.</p>
-<p>You can stop the training by singa-stop.sh</p>
+<p>You can get some job information like job ID and running processes using 
the singa-console.sh script:</p>
 
 <div class="source">
-<div class="source"><pre class="prettyprint">./bin/singa-stop.sh
-Kill singa @ logbase-a04 ...
-Kill singa @ logbase-a05 ...
-bash: line 1: 27119 Killed                  ./singa 
-workspace=/home/singa/wangwei/incubator-singa/examples/cifar10 -job=4
-Kill singa @ logbase-a06 ...
-bash: line 1:  3504 Killed                  ./singa 
-workspace=/home/singa/wangwei/incubator-singa/examples/cifar10 -job=4
-Cleanning metadata in zookeeper ...
+<div class="source"><pre class="prettyprint">./bin/singa-console.sh list
+JOB ID    |NUM PROCS
+----------|-----------
+job-4     |2
+</pre></div></div>
+<p>To kill the job, just run</p>
+
+<div class="source">
+<div class="source"><pre class="prettyprint">./bin/singa-console.sh kill job-4
 </pre></div></div>
 <!-- -
 In other words,

svn commit: r959251 - in /websites/staging/singa/trunk/content: ./ quick-start.html

Reply via email to