Author: wangwei
Date: Mon Jun 15 08:26:43 2015
New Revision: 1685508
URL: http://svn.apache.org/r1685508
Log:
add instructions for running shared memory Hogwild training framework, i.e.,
training with multiple workers in single node.
Modified:
incubator/singa/site/trunk/content/markdown/quick-start.md
Modified: incubator/singa/site/trunk/content/markdown/quick-start.md
URL:
http://svn.apache.org/viewvc/incubator/singa/site/trunk/content/markdown/quick-start.md?rev=1685508&r1=1685507&r2=1685508&view=diff
==============================================================================
--- incubator/singa/site/trunk/content/markdown/quick-start.md (original)
+++ incubator/singa/site/trunk/content/markdown/quick-start.md Mon Jun 15
08:26:43 2015
@@ -92,18 +92,24 @@ Start the training by running:
##### Training with data Partitioning
-There are two cases for data partition:
+ nworker_groups: 2
+ nserver_groups: 1
+ nservers_per_group: 1
+ nworkers_per_group: 1
+ nworkers_per_procs: 2
+ workspace: "examples/cifar10/"
-* partition the dataset among worker groups such that one worker group is
- assigned one partition. Groups run asynchronously.
+The above cluster configuration file specifies two worker groups and one
server group.
+Worker groups run asynchronously but share the memory space for parameter
values. In other words,
+it runs as the Hogwild algorithm. Since it is running in a single node, we can
avoid partitioning the
+dataset explicitly. In specific, a random start offset is assigned to each
worker group such that they
+would not work on the same mini-batch for every iteration. Consequently, they
run like on different data
+partitions. The running command is the same:
-* partition the neural network among workers within one group. Each layer is
-sliced such that every worker is assigned one sliced layer. The sliced layer is
-the same as the original layer except that it only has B/g feature instances,
-where B is the size of instances in a mini-batch, g is the number of workers in
-a group. All workers run synchronously.
+ ./bin/singa-run.sh -model=examples/cifar10/model.conf
-cluster=examples/cifar10/cluster.conf
-To run the second case with 2 workers, just change the cluster.conf as:
+
+##### Training with model Partitioning
nworker_groups: 1
nserver_groups: 1
@@ -112,11 +118,18 @@ To run the second case with 2 workers, j
nworkers_per_procs: 2
workspace: "examples/cifar10/"
+The above cluster configuration specifies one worker group with two workers.
+The workers run synchronously, i.e., they are synchronized after one iteration.
+The model is partitioned among the two workers. In specific, each layer is
+sliced such that every worker is assigned one sliced layer. The sliced layer is
+the same as the original layer except that it only has B/g feature instances,
+where B is the size of instances in a mini-batch, g is the number of workers in
+a group.
+
All other settings are the same as running without partitioning
./bin/singa-run.sh -model=examples/cifar10/model.conf
-cluster=examples/cifar10/cluster.conf
-##### Training with model Partitioning
#### Training in a cluster