Author: wangwei
Date: Thu Jul 23 05:53:12 2015
New Revision: 1692349
URL: http://svn.apache.org/r1692349
Log:
update quick start minor change
Modified:
incubator/singa/site/trunk/content/markdown/quick-start.md
Modified: incubator/singa/site/trunk/content/markdown/quick-start.md
URL:
http://svn.apache.org/viewvc/incubator/singa/site/trunk/content/markdown/quick-start.md?rev=1692349&r1=1692348&r2=1692349&view=diff
==============================================================================
--- incubator/singa/site/trunk/content/markdown/quick-start.md (original)
+++ incubator/singa/site/trunk/content/markdown/quick-start.md Thu Jul 23
05:53:12 2015
@@ -71,7 +71,7 @@ Start the training by running:
cd ../..
./bin/singa-run.sh -workspace=examples/cifar10
-Note: we have changed the command line arguments from `-cluster... -model=...`
+Note: we have changed the command line arguments from `-cluster.. -model..`
to `-workspace`. The `workspace` folder must have a job.conf file which
specifies the cluster (number of workers, number of servers, etc) and model
configuration.
@@ -79,13 +79,13 @@ configuration.
Some training information will be shown on the screen like:
Starting zookeeper ... already running as process 21660.
- Generate host list to
/home/singa/wangwei/incubator-singa/examples/cifar10/job.hosts
- Generate job id to
/home/singa/wangwei/incubator-singa/examples/cifar10/job.id [job_id = 1]
- Executing : ./singa
-workspace=/home/singa/wangwei/incubator-singa/examples/cifar10 -job=1
+ Generate host list to SINGA_ROOT/examples/cifar10/job.hosts
+ Generate job id to SINGA_ROOT/examples/cifar10/job.id [job_id = 1]
+ Executing : ./singa -workspace=SINGA_ROOT/examples/cifar10 -job=1
proc #0 -> 10.10.10.14:49152 (pid = 26724)
Server (group = 0, id = 0) start
Worker (group = 0, id = 0) start
- Generate pid list to
/home/singa/wangwei/incubator-singa/examples/cifar10/job.pids
+ Generate pid list to SINGA_ROOT/examples/cifar10/job.pids
Test step-0, loss : 2.302607, accuracy : 0.090100
Train step-0, loss : 2.302614, accuracy : 0.062500
Train step-30, loss : 2.302403, accuracy : 0.141129
@@ -107,6 +107,12 @@ The dumped file can be used for continui
for other similar models. [Checkpoint and Resume](checkpoint.html) discusses
more details.
+The job can be stopped by
+
+ ./bin/singa-stop.sh
+
+It will kill all singa processes.
+
<!---
To train the model without any partitioning, you just set the numbers
in the cluster configuration file (*cluster.conf*) as :
@@ -148,14 +154,14 @@ To run SINGA in a cluster,
1. A hostfile should be prepared under conf/ folder, e.g.,
// hostfile
- logbase-a04
- logbase-a05
- logbase-a06
+ singa-node1
+ singa-node2
+ singa-node3
...
2. The zookeeper location must be configured in conf/singa.conf, e.g.,
- zookeeper_host: "logbase-a04:2181"
+ zookeeper_host: "singa-node1:2181"
3. Make your ssh command password-free
@@ -169,27 +175,18 @@ launch and will generate a job.hosts fil
all nodes in conf/hostfile. Hence if there are few nodes in the hostfile, then
multiple processes would be launched in one node.
-You can get some job information like job ID and running processes using the
-singa-console.sh script:
-
- ./bin/singa-console.sh list
- JOB ID |NUM PROCS
- ----------|-----------
- job-4 |2
-
Sample training output is
- Generate job id to
/home/singa/wangwei/incubator-singa/examples/cifar10/job.id [job_id = 4]
- Executing @ logbase-a04 : cd /home/singa/wangwei/incubator-singa; ./singa
-workspace=/home/singa/wangwei/incubator-singa/examples/cifar10 -job=4
- Executing @ logbase-a05 : cd /home/singa/wangwei/incubator-singa; ./singa
-workspace=/home/singa/wangwei/incubator-singa/examples/cifar10 -job=4
+ Generate job id to SINGA_ROOT/examples/cifar10/job.id [job_id = 4]
+ Executing @ singa-node1: cd SINGA_ROOT; ./singa
-workspace=SINGA_ROOT/examples/cifar10 -job=4
+ Executing @ singa-node2: cd SINGA_ROOT; ./singa
-workspace=SINGA_ROOT/examples/cifar10 -job=4
proc #0 -> 10.10.10.15:49152 (pid = 3504)
proc #1 -> 10.10.10.14:49152 (pid = 27119)
Server (group = 0, id = 1) start
Worker (group = 1, id = 0) start
Server (group = 0, id = 0) start
Worker (group = 0, id = 0) start
- Generate pid list to
- /home/singa/wangwei/incubator-singa/examples/cifar10/job.pids
+ Generate pid list to SINGA_ROOT/examples/cifar10/job.pids
Test step-0, loss : 2.297355, accuracy : 0.101700
Train step-0, loss : 2.274724, accuracy : 0.062500
Train step-30, loss : 2.263850, accuracy : 0.131048
@@ -208,15 +205,19 @@ Sample training output is
We can see that the accuracy (resp. loss) from distributed training increases
(resp.
decreases) faster than that for the single node training.
-You can stop the training by singa-stop.sh
- ./bin/singa-stop.sh
- Kill singa @ logbase-a04 ...
- Kill singa @ logbase-a05 ...
- bash: line 1: 27119 Killed ./singa
-workspace=/home/singa/wangwei/incubator-singa/examples/cifar10 -job=4
- Kill singa @ logbase-a06 ...
- bash: line 1: 3504 Killed ./singa
-workspace=/home/singa/wangwei/incubator-singa/examples/cifar10 -job=4
- Cleanning metadata in zookeeper ...
+You can get some job information like job ID and running processes using the
+singa-console.sh script:
+
+ ./bin/singa-console.sh list
+ JOB ID |NUM PROCS
+ ----------|-----------
+ job-4 |2
+
+To kill the job, just run
+
+ ./bin/singa-console.sh kill job-4
+
<!---