Modified: websites/staging/singa/trunk/content/introduction.html ============================================================================== --- websites/staging/singa/trunk/content/introduction.html (original) +++ websites/staging/singa/trunk/content/introduction.html Wed Jul 22 15:43:23 2015 @@ -1,13 +1,13 @@ <!DOCTYPE html> <!-- - | Generated by Apache Maven Doxia at 2015-07-20 + | Generated by Apache Maven Doxia at 2015-07-22 | Rendered using Apache Maven Fluido Skin 1.4 --> <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"> <head> <meta charset="UTF-8" /> <meta name="viewport" content="width=device-width, initial-scale=1.0" /> - <meta name="Date-Revision-yyyymmdd" content="20150720" /> + <meta name="Date-Revision-yyyymmdd" content="20150722" /> <meta http-equiv="Content-Language" content="en" /> <title>Apache SINGA – Introduction</title> <link rel="stylesheet" href="./css/apache-maven-fluido-1.4.min.css" /> @@ -71,7 +71,7 @@ </li> <li class="dropdown-submenu"> - <a href="docs/program-model.html" title="Programming Model">Programming Model</a> + <a href="docs/user-guide.html" title="User Guide">User Guide</a> <ul class="dropdown-menu"> <li> <a href="docs/model-config.html" title="Model Configuration">Model Configuration</a> </li> @@ -241,9 +241,9 @@ <li> - <a href="docs/program-model.html" title="Programming Model"> + <a href="docs/user-guide.html" title="User Guide"> <span class="icon-chevron-down"></span> - Programming Model</a> + User Guide</a> <ul class="nav nav-list"> <li> @@ -452,7 +452,7 @@ <h3><a name="Overview"></a>Overview</h3> <p>SINGA is designed to be general to implement the distributed training algorithms of existing systems. Distributed deep learning training is an on- going challenge research problem in terms of scalability. There is no established scalable distributed training algorithm. Different algorithms are used by existing systems, e.g. Hogwild used by Caffe, AllReduce used by Baidu’s DeepImage, and the Downpour algorithm proposed by Google Brain and used at Microsoft Adam. SINGA provides users the chance to select the one that is most scalable for their model and data.</p> <p>To provide good usability, SINGA provides a simple programming model based on the layer structure that is common in deep learning models. Users override the base layer class to implement their own layer logics for feature transformation. A model is constructed by configuring each layer and their connections like Caffe. SINGA takes care of the data and model partitioning, and makes the underlying distributed communication (almost) transparent to users. A set of built-in layers and example models are provided.</p> -<p>SINGA is an <a class="externalLink" href="http://singa.incubator.apache.org/">Apache incubator project</a>, released under Apache License 2. It is mainly developed by the DBSystem group of National University of Singapore. A diverse community is being constructed to welcome open-source contribution. </p></div> +<p>SINGA is an <a class="externalLink" href="http://singa.incubator.apache.org/">Apache incubator project</a>, released under Apache License 2. It is mainly developed by the DBSystem group of National University of Singapore. A diverse community is being constructed to welcome open-source contribution.</p></div> <div class="section"> <h3><a name="Goals_and_Principles"></a>Goals and Principles</h3> <div class="section"> @@ -488,11 +488,39 @@ </ul> <p>Considering extensibility, we make our core data structures (e.g., Layer) and operations general enough for programmers to override.</p></div></div> <div class="section"> -<h3><a name="System_Architecture"></a>System Architecture</h3> -<p><img src="images/arch.png" alt="SINGA Logical Architecture" style="width: 500px" /> -<p><b>SINGA Logical Architecture</b></p> -<p>The logical system architecture is shown in the above figure. There are two types of execution units, namely workers and servers. They are grouped according to the cluster configuration. Each worker group runs against a partition of the training dataset to compute the updates (e.g., the gradients) of parameters on one model replica, denoted as ParamShard. Worker groups run asynchronously, while workers within one group run synchronously with each worker computing (partial) updates for a subset of model parameters. Each server group also maintains one replica of the model parameters (i.e., ParamShard). It receives and handles requests (e.g., Get/Put/Update) from workers. Every server group synchronizes with neighboring server groups periodically or ac- cording to some specified rules.</p> -<p>SINGA starts by parsing the cluster and model configurations. The first worker group initializes model parameters and sends Put requests to put them into the ParamShards of servers. Then every worker group runs the training algorithm by iterating over its training data in mini-batch. Each worker collects the fresh parameters from servers before computing the updates (e.g., gradients) for them. Once it finishes the computation, it issues update requests to the servers.</p></div></div> +<h3><a name="Where_to_go_from_here"></a>Where to go from here</h3> + +<ul> + +<li> +<p>SINGA <a href="user-guide.html">User guide</a> describes how to submit a training job for your own deep learning model.</p></li> + +<li> +<p>SINGA <a href="architecture.html">architecture</a> illustrates how different training frameworks are supported using a general system architecture.</p></li> + +<li> +<p><a href="examples.html">Training examples</a> are provided to help users get started with SINGA.</p></li> +</ul> +<!-- - +### System Architecture + +<img src="images/arch.png" alt="SINGA Logical Architecture" style="width: 500px"/> +<p><strong>SINGA Logical Architecture</strong></p> + +The logical system architecture is shown in the above figure. There are two types of execution units, +namely workers and servers. They are grouped according to the cluster configuration. Each worker +group runs against a partition of the training dataset to compute the updates (e.g., the gradients) +of parameters on one model replica, denoted as ParamShard. Worker groups run asynchronously, while +workers within one group run synchronously with each worker computing (partial) updates for a subset +of model parameters. Each server group also maintains one replica of the model parameters +(i.e., ParamShard). It receives and handles requests (e.g., Get/Put/Update) from workers. Every server +group synchronizes with neighboring server groups periodically or ac- cording to some specified rules. + +SINGA starts by parsing the cluster and model configurations. The first worker group initializes model +parameters and sends Put requests to put them into the ParamShards of servers. Then every worker group +runs the training algorithm by iterating over its training data in mini-batch. Each worker collects the +fresh parameters from servers before computing the updates (e.g., gradients) for them. Once it finishes +the computation, it issues update requests to the servers. --></div></div> </div> </div> </div>
Modified: websites/staging/singa/trunk/content/quick-start.html ============================================================================== --- websites/staging/singa/trunk/content/quick-start.html (original) +++ websites/staging/singa/trunk/content/quick-start.html Wed Jul 22 15:43:23 2015 @@ -1,13 +1,13 @@ <!DOCTYPE html> <!-- - | Generated by Apache Maven Doxia at 2015-07-20 + | Generated by Apache Maven Doxia at 2015-07-22 | Rendered using Apache Maven Fluido Skin 1.4 --> <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"> <head> <meta charset="UTF-8" /> <meta name="viewport" content="width=device-width, initial-scale=1.0" /> - <meta name="Date-Revision-yyyymmdd" content="20150720" /> + <meta name="Date-Revision-yyyymmdd" content="20150722" /> <meta http-equiv="Content-Language" content="en" /> <title>Apache SINGA – Quick Start</title> <link rel="stylesheet" href="./css/apache-maven-fluido-1.4.min.css" /> @@ -71,7 +71,7 @@ </li> <li class="dropdown-submenu"> - <a href="docs/program-model.html" title="Programming Model">Programming Model</a> + <a href="docs/user-guide.html" title="User Guide">User Guide</a> <ul class="dropdown-menu"> <li> <a href="docs/model-config.html" title="Model Configuration">Model Configuration</a> </li> @@ -241,9 +241,9 @@ <li> - <a href="docs/program-model.html" title="Programming Model"> + <a href="docs/user-guide.html" title="User Guide"> <span class="icon-chevron-down"></span> - Programming Model</a> + User Guide</a> <ul class="nav nav-list"> <li> @@ -471,15 +471,20 @@ git clone https://github.com/apache/incu <div class="source"><pre class="prettyprint">./configure make </pre></div></div> -<p>If there are dependent libraries missing, please refer to <a href="docs/installation.html">installation</a> page for guidance on installing them.</p></div> +<p>If there are dependent libraries missing, please refer to <a href="docs/installation.html">installation</a> page for guidance on installing them.</p> +<!-- - +### Run in standalone mode + +Running SINGA in standalone mode is on the contrary of running it on Mesos or +YARN. For standalone mode, users have to manage the resources manually. For +instance, they have to prepare a host file containing all running nodes. +There is no management on CPU and memory resources, hence SINGA consumes as much +CPU and memory resources as it needs. --></div> <div class="section"> -<h3><a name="Run_in_standalone_mode"></a>Run in standalone mode</h3> -<p>Running SINGA in standalone mode is on the contrary of running it on Mesos or YARN. For standalone mode, users have to manage the resources manually. For instance, they have to prepare a host file containing all running nodes. There is no management on CPU and memory resources, hence SINGA consumes as much CPU and memory resources as it needs.</p> -<div class="section"> -<h4><a name="Training_on_a_single_node"></a>Training on a single node</h4> +<h3><a name="Training_on_a_single_node"></a>Training on a single node</h3> <p>For single node training, one process will be launched to run the SINGA code on the node where SINGA is started. We train the <a class="externalLink" href="http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks">CNN model</a> over the <a class="externalLink" href="http://www.cs.toronto.edu/~kriz/cifar.html">CIFAR-10</a> dataset as an example. The hyper-parameters are set following <a class="externalLink" href="https://code.google.com/p/cuda-convnet/">cuda-convnet</a>.</p> <div class="section"> -<h5><a name="Data_and_model_preparation"></a>Data and model preparation</h5> +<h4><a name="Data_and_model_preparation"></a>Data and model preparation</h4> <p>Download the dataset and create the data shards for training and testing.</p> <div class="source"> @@ -491,82 +496,201 @@ make create <p>A training dataset and a test dataset are created under <i>train-shard</i> and <i>test-shard</i> folder respectively. A image_mean.bin file is also generated, which contains the feature mean of all images. <!-- After creating the data shards, you to update the paths in the model configuration file (*model.conf*) for the training data shard, test data shard and the mean file. --></p> -<p>Since all modules used for training this CNN model are provided by SINGA as built-in modules, there is no need to write any code. Instead, you just executable the running script (<i>../../bin/singa-run.sh</i>) by providing the model configuration file (<i>model.conf</i>). If you want to implement your own modules, e.g., layer, then you have to register your modules in the driver code. After compiling the driver code, link it with the SINGA library to generate the executable. More details are described in <a href="">Code your own models</a>.</p></div> -<div class="section"> -<h5><a name="Training_without_partitioning"></a>Training without partitioning</h5> -<p>To train the model without any partitioning, you just set the numbers in the cluster configuration file (<i>cluster.conf</i>) as :</p> - -<div class="source"> -<div class="source"><pre class="prettyprint">nworker_groups: 1 -nworkers_per_group: 1 -nserver_groups: 1 -nservers_per_group: 1 -</pre></div></div> -<p>One worker group trains against one partition of the training dataset. If <i>nworker_groups</i> is set to 1, then there is no data partitioning. One worker runs over a partition of the model. If <i>nworkers_per_group</i> is set to 1, then there is no model partitioning. More details on the cluster configuration are described in the <a href="docs/architecture.html">System Architecture</a> page.</p> +<p>Since all modules used for training this CNN model are provided by SINGA as built-in modules, there is no need to write any code. You just execute the script (<i>../../bin/singa-run.sh</i>) by providing the workspace which includes the job configuration file (<i>job.conf</i>). If you want to implement your own modules, e.g., layer, then you have to register your modules in the <a href="user-guide.html">driver program</a>.</p> <p>Start the training by running:</p> <div class="source"> <div class="source"><pre class="prettyprint">#goto top level folder cd ../.. -./bin/singa-run.sh -model=examples/cifar10/model.conf -cluster=examples/cifar10/cluster.conf -</pre></div></div></div> -<div class="section"> -<h5><a name="Training_with_data_Partitioning"></a>Training with data Partitioning</h5> +./bin/singa-run.sh -workspace=examples/cifar10 +</pre></div></div> +<p>Note: we have changed the command line arguments from <tt>-cluster... -model=...</tt> to <tt>-workspace</tt>. The <tt>workspace</tt> folder must have a job.conf file which specifies the cluster (number of workers, number of servers, etc) and model configuration.</p> +<p>Some training information will be shown on the screen like:</p> <div class="source"> -<div class="source"><pre class="prettyprint">nworker_groups: 2 -nserver_groups: 1 -nservers_per_group: 1 -nworkers_per_group: 1 -nworkers_per_procs: 2 -workspace: "examples/cifar10/" +<div class="source"><pre class="prettyprint">Starting zookeeper ... already running as process 21660. +Generate host list to /home/singa/wangwei/incubator-singa/examples/cifar10/job.hosts +Generate job id to /home/singa/wangwei/incubator-singa/examples/cifar10/job.id [job_id = 1] +Executing : ./singa -workspace=/home/singa/wangwei/incubator-singa/examples/cifar10 -job=1 +proc #0 -> 10.10.10.14:49152 (pid = 26724) +Server (group = 0, id = 0) start +Worker (group = 0, id = 0) start +Generate pid list to /home/singa/wangwei/incubator-singa/examples/cifar10/job.pids +Test step-0, loss : 2.302607, accuracy : 0.090100 +Train step-0, loss : 2.302614, accuracy : 0.062500 +Train step-30, loss : 2.302403, accuracy : 0.141129 +Train step-60, loss : 2.301960, accuracy : 0.155738 +Train step-90, loss : 2.301470, accuracy : 0.159341 +Train step-120, loss : 2.301048, accuracy : 0.160640 +Train step-150, loss : 2.300414, accuracy : 0.161424 +Train step-180, loss : 2.299842, accuracy : 0.160912 +Train step-210, loss : 2.298510, accuracy : 0.163211 +Train step-240, loss : 2.297058, accuracy : 0.163641 +Train step-270, loss : 2.295308, accuracy : 0.163745 +Test step-300, loss : 2.256824, accuracy : 0.193500 +Train step-300, loss : 2.292490, accuracy : 0.165282 </pre></div></div> -<p>The above cluster configuration file specifies two worker groups and one server group. Worker groups run asynchronously but share the memory space for parameter values. In other words, it runs as the Hogwild algorithm. Since it is running in a single node, we can avoid partitioning the dataset explicitly. In specific, a random start offset is assigned to each worker group such that they would not work on the same mini-batch for every iteration. Consequently, they run like on different data partitions. The running command is the same:</p> +<p>You can find more logs under the <tt>/tmp</tt> folder. Once the training is finished the learned model parameters will be dumped into $workspace/checkpoint folder. The dumped file can be used for continuing the training or as initialization for other similar models. <a href="checkpoint.html">Checkpoint and Resume</a> discusses more details.</p> +<!-- - +To train the model without any partitioning, you just set the numbers +in the cluster configuration file (*cluster.conf*) as : + + nworker_groups: 1 + nworkers_per_group: 1 + nserver_groups: 1 + nservers_per_group: 1 + +One worker group trains against one partition of the training dataset. If +*nworker_groups* is set to 1, then there is no data partitioning. One worker +runs over a partition of the model. If *nworkers_per_group* is set to 1, then +there is no model partitioning. More details on the cluster configuration are +described in the [System Architecture](docs/architecture.html) page. --></div> +<div class="section"> +<h4><a name="Distributed_Training"></a>Distributed Training</h4> +<p>To train the model in distributed environment, we first change the job configuration to use 2 worker groups (one worker per group) and 2 servers (from the same server group).</p> + +<div class="source"> +<div class="source"><pre class="prettyprint">// job.conf +cluster { + nworker_groups: 2 + nserver_groups: 1 + nservers_per_group: 2 +} +</pre></div></div> +<p>This configuration would run SINGA using Downpour training framework. In specific, the 2 worker groups run asynchronously to compute the parameter gradients. Each server maintains a subset of parameters, i.e., updating the parameters based on gradients passed by workers.</p> +<p>To run SINGA in a cluster,</p> +<ol style="list-style-type: decimal"> + +<li> +<p>A hostfile should be prepared under conf/ folder, e.g.,</p> + <div class="source"> -<div class="source"><pre class="prettyprint">./bin/singa-run.sh -model=examples/cifar10/model.conf -cluster=examples/cifar10/cluster.conf -</pre></div></div></div> -<div class="section"> -<h5><a name="Training_with_model_Partitioning"></a>Training with model Partitioning</h5> +<div class="source"><pre class="prettyprint">// hostfile +logbase-a04 +logbase-a05 +logbase-a06 +... +</pre></div></div></li> + +<li> +<p>The zookeeper location must be configured in conf/singa.conf, e.g.,</p> +<p>zookeeper_host: “logbase-a04:2181”</p></li> + +<li> +<p>Make your ssh command password-free</p></li> +</ol> +<p>Currently, we assume the data files are on NFS, i.e., visible to all nodes. To start the training, run</p> <div class="source"> -<div class="source"><pre class="prettyprint">nworker_groups: 1 -nserver_groups: 1 -nservers_per_group: 1 -nworkers_per_group: 2 -nworkers_per_procs: 2 -workspace: "examples/cifar10/" +<div class="source"><pre class="prettyprint">./bin/singa-run.sh -workspace=examples/cifar10 </pre></div></div> -<p>The above cluster configuration specifies one worker group with two workers. The workers run synchronously, i.e., they are synchronized after one iteration. The model is partitioned among the two workers. In specific, each layer is sliced such that every worker is assigned one sliced layer. The sliced layer is the same as the original layer except that it only has B/g feature instances, where B is the size of instances in a mini-batch, g is the number of workers in a group. </p> -<p>All other settings are the same as running without partitioning</p> +<p>The <tt>singa-run.sh</tt> will calculate the number of nodes (i.e., processes) to launch and will generate a job.hosts file under workspace by looping all nodes in conf/hostfile. Hence if there are few nodes in the hostfile, then multiple processes would be launched in one node.</p> +<p>You can get some job information like job ID and running processes using the singa-console.sh script:</p> <div class="source"> -<div class="source"><pre class="prettyprint">./bin/singa-run.sh -model=examples/cifar10/model.conf -cluster=examples/cifar10/cluster.conf -</pre></div></div></div></div> -<div class="section"> -<h4><a name="Training_in_a_cluster"></a>Training in a cluster</h4> -<p>To run the distributed Hogwild framework, configure the cluster.conf as:</p> - -<div class="source"> -<div class="source"><pre class="prettyprint">nworker_groups: 2 -nserver_groups: 2 +<div class="source"><pre class="prettyprint">./bin/singa-console.sh list +JOB ID |NUM PROCS +----------|----------- +job-4 |2 </pre></div></div> -<p>and start one process as,</p> +<p>Sample training output is</p> <div class="source"> -<div class="source"><pre class="prettyprint">./bin/singa-run.sh -model=examples/cifar10/model.conf -cluster=examples/cifar10/cluster.conf +<div class="source"><pre class="prettyprint">Generate job id to /home/singa/wangwei/incubator-singa/examples/cifar10/job.id [job_id = 4] +Executing @ logbase-a04 : cd /home/singa/wangwei/incubator-singa; ./singa -workspace=/home/singa/wangwei/incubator-singa/examples/cifar10 -job=4 +Executing @ logbase-a05 : cd /home/singa/wangwei/incubator-singa; ./singa -workspace=/home/singa/wangwei/incubator-singa/examples/cifar10 -job=4 +proc #0 -> 10.10.10.15:49152 (pid = 3504) +proc #1 -> 10.10.10.14:49152 (pid = 27119) +Server (group = 0, id = 1) start +Worker (group = 1, id = 0) start +Server (group = 0, id = 0) start +Worker (group = 0, id = 0) start +Generate pid list to +/home/singa/wangwei/incubator-singa/examples/cifar10/job.pids +Test step-0, loss : 2.297355, accuracy : 0.101700 +Train step-0, loss : 2.274724, accuracy : 0.062500 +Train step-30, loss : 2.263850, accuracy : 0.131048 +Train step-60, loss : 2.249972, accuracy : 0.133197 +Train step-90, loss : 2.235008, accuracy : 0.151786 +Train step-120, loss : 2.228674, accuracy : 0.154959 +Train step-150, loss : 2.215979, accuracy : 0.165149 +Train step-180, loss : 2.198111, accuracy : 0.180249 +Train step-210, loss : 2.175717, accuracy : 0.188389 +Train step-240, loss : 2.160980, accuracy : 0.197095 +Train step-270, loss : 2.145763, accuracy : 0.202030 +Test step-300, loss : 1.921962, accuracy : 0.299100 +Train step-300, loss : 2.129271, accuracy : 0.208056 </pre></div></div> -<p>and then start another process as,</p> +<p>We can see that the accuracy (resp. loss) distributed training increases (resp. decreases) faster than single node training.</p> +<p>You can stop the training by singa-stop.sh</p> <div class="source"> -<div class="source"><pre class="prettyprint">./singa -model=examples/cifar10/model.conf -cluster=examples/cifar10/cluster.conf +<div class="source"><pre class="prettyprint">./bin/singa-stop.sh +Kill singa @ logbase-a04 ... +Kill singa @ logbase-a05 ... +bash: line 1: 27119 Killed ./singa -workspace=/home/singa/wangwei/incubator-singa/examples/cifar10 -job=4 +Kill singa @ logbase-a06 ... +bash: line 1: 3504 Killed ./singa -workspace=/home/singa/wangwei/incubator-singa/examples/cifar10 -job=4 +Cleanning metadata in zookeeper ... </pre></div></div> -<p>Note that the two commands are different! The first one will start the zookeeper. Currently we assume that the example/cifar10 folder is in NFS. </p></div></div> -<div class="section"> -<h3><a name="Run_with_Mesos"></a>Run with Mesos</h3> -<p><i>in working</i>…</p></div> -<div class="section"> -<h3><a name="Run_with_YARN"></a>Run with YARN</h3></div></div> +<!-- - +In other words, +it runs as the Hogwild algorithm. Since it is running in a single node, we can avoid partitioning the +dataset explicitly. In specific, a random start offset is assigned to each worker group such that they +would not work on the same mini-batch for every iteration. Consequently, they run like on different data +partitions. +The running command is the same: + + ./bin/singa-run.sh -model=examples/cifar10/model.conf -cluster=examples/cifar10/cluster.conf + + +##### Training with model Partitioning + + nworker_groups: 1 + nserver_groups: 1 + nservers_per_group: 1 + nworkers_per_group: 2 + nworkers_per_procs: 2 + workspace: "examples/cifar10/" + +The above cluster configuration specifies one worker group with two workers. +The workers run synchronously, i.e., they are synchronized after one iteration. +The model is partitioned among the two workers. In specific, each layer is +sliced such that every worker is assigned one sliced layer. The sliced layer is +the same as the original layer except that it only has B/g feature instances, +where B is the size of instances in a mini-batch, g is the number of workers in +a group. + +All other settings are the same as running without partitioning + + ./bin/singa-run.sh -model=examples/cifar10/model.conf -cluster=examples/cifar10/cluster.conf + + + +#### Training in a cluster + +To run the distributed Hogwild framework, configure the cluster.conf as: + + nworker_groups: 2 + nserver_groups: 2 + +and start one process as, + + ./bin/singa-run.sh -model=examples/cifar10/model.conf -cluster=examples/cifar10/cluster.conf + +and then start another process as, + + ./singa -model=examples/cifar10/model.conf -cluster=examples/cifar10/cluster.conf + +Note that the two commands are different! The first one will start the zookeeper. Currently we assume +that the example/cifar10 folder is in NFS. + +### Run with Mesos + +*in working*... + +### Run with YARN --></div></div></div> </div> </div> </div>
