Author: wangsh
Date: Sat Jan  9 10:21:49 2016
New Revision: 1723842

URL: http://svn.apache.org/viewvc?rev=1723842&view=rev
Log:
add docs for hybrid partition

Added:
    incubator/singa/site/trunk/content/markdown/docs/hybrid.md
Modified:
    incubator/singa/site/trunk/content/markdown/docs/architecture.md
    incubator/singa/site/trunk/content/markdown/docs/distributed-training.md
    incubator/singa/site/trunk/content/site.xml

Modified: incubator/singa/site/trunk/content/markdown/docs/architecture.md
URL: 
http://svn.apache.org/viewvc/incubator/singa/site/trunk/content/markdown/docs/architecture.md?rev=1723842&r1=1723841&r2=1723842&view=diff
==============================================================================
--- incubator/singa/site/trunk/content/markdown/docs/architecture.md (original)
+++ incubator/singa/site/trunk/content/markdown/docs/architecture.md Sat Jan  9 
10:21:49 2016
@@ -35,7 +35,7 @@ within a group:
   against all data partitioned to the group.
   * **Data parallelism**. Each worker computes all parameters
   against a subset of data.
-  * [**Hybrid parallelism**](). SINGA also supports hybrid parallelism.
+  * [**Hybrid parallelism**](hybrid.html). SINGA also supports hybrid 
parallelism.
 
 
 ## Implementation

Modified: 
incubator/singa/site/trunk/content/markdown/docs/distributed-training.md
URL: 
http://svn.apache.org/viewvc/incubator/singa/site/trunk/content/markdown/docs/distributed-training.md?rev=1723842&r1=1723841&r2=1723842&view=diff
==============================================================================
--- incubator/singa/site/trunk/content/markdown/docs/distributed-training.md 
(original)
+++ incubator/singa/site/trunk/content/markdown/docs/distributed-training.md 
Sat Jan  9 10:21:49 2016
@@ -2,14 +2,7 @@
 
 ---
 
-SINGA is designed for distributed training of large deep learning models with 
huge amount of training data. It is intergrated with Mesos, so that distributed 
training can be started as a Mesos framework. Currently, the Mesos cluster can 
be set up from SINGA containers, i.e. we provide Docker images that bundles 
Mesos and SINGA together. Refer to the guide below for instructions as how to 
start and use the cluster.
-
-* [Distributed training on Mesos](mesos.html)
-
-SINGA can run on top of distributed storage system to achieve scalability. The 
current version of SINGA supports HDFS.
-
-* [Running SINGA on HDFS](hdfs.html)
-
+SINGA is designed for distributed training of large deep learning models with 
huge amount of training data.
 We also provide high-level descriptions of design behind SINGA's distributed 
architecture. 
 
 * [System Architecture](architecture.html)
@@ -17,3 +10,16 @@ We also provide high-level descriptions
 * [Training Frameworks](frameworks.html)
 
 * [System Communication](communication.html)
+
+SINGA supports different options for training a model in parallel, includeing 
data parallelism, model parallelism and hybrid parallelism.
+
+* [Hybrid Parallelism](hybrid.html)
+
+SINGA is intergrated with Mesos, so that distributed training can be started 
as a Mesos framework. Currently, the Mesos cluster can be set up from SINGA 
containers, i.e. we provide Docker images that bundles Mesos and SINGA 
together. Refer to the guide below for instructions as how to start and use the 
cluster.
+
+* [Distributed training on Mesos](mesos.html)
+
+SINGA can run on top of distributed storage system to achieve scalability. The 
current version of SINGA supports HDFS.
+
+* [Running SINGA on HDFS](hdfs.html)
+

Added: incubator/singa/site/trunk/content/markdown/docs/hybrid.md
URL: 
http://svn.apache.org/viewvc/incubator/singa/site/trunk/content/markdown/docs/hybrid.md?rev=1723842&view=auto
==============================================================================
--- incubator/singa/site/trunk/content/markdown/docs/hybrid.md (added)
+++ incubator/singa/site/trunk/content/markdown/docs/hybrid.md Sat Jan  9 
10:21:49 2016
@@ -0,0 +1,83 @@
+# Hybrid Parallelism
+
+---
+
+## User Guide
+
+SINGA supports different parallelism options for distributed training.
+Users just need to configure it in the job configuration.
+
+Both `NetProto` and `LayerProto` have a field `partition_dim` to control the 
parallelism option:
+
+  * `partition_dim=0`: neuralnet/layer is partitioned on data dimension, i.e., 
each worker processes a subset of data records.
+  * `partition_dim=1`: neuralnet/layer is partitioned on feature dimension, 
i.e., each worker maintains a subset of feature parameters.
+
+`partition_dim` field in `NetProto` will be applied to all layers, unless a 
layer has its own `partition_dim` field set.
+
+If we want data parallelism for the whole model, just leave `partition_dim` as 
default (which is 0), or configure the job.conf like:
+
+```
+neuralnet {
+  partition_dim: 0
+  layer {
+    name: ... 
+    type: ...
+  }
+  ...
+}
+```
+
+With the hybrid parallelism, we can have layers either partitioned on data 
dimension or feature dimension.
+For example, if we want a specific layer partitioned on feature dimension, 
just configure like:
+
+```
+neuralnet {
+  partition_dim: 0
+  layer {
+    name: "layer1_partition_on_data_dimension"
+    type: ...
+  }
+  layer {
+    name: "layer2_partition_on_feature_dimension"
+    type: ...
+    partition_dim: 1
+  }
+  ...
+}
+```
+
+## Developer Guide
+
+To support hybrid parallelism, after singa read users' model and paration 
configuration, a set of connection layers are automatically added between 
layers when needed:
+
+* `BridgeSrcLayer` & `BridgeDstLayer` are added when two connected layers are 
not in the same machine. They are paired and are responsible for sending 
data/gradient to the other side during each iteration.
+
+* `ConcateLayer` is added when there are multiple source layers. It combines 
their feature blobs along a given dimension.
+
+* `SliceLayer` is added when there are mutliple dest layers, each of which 
only needs a subset(slice) of this layers' feature blob.
+
+* `SplitLayer` is added when there are multiple dest layers, each of which 
needs the whole feature blob.
+
+Following is the logic used in our code to add connection layers:
+
+```
+Add Slice, Concate, Split Layers for Hybrid Partition
+
+All cases are as follows:
+src_pdim | dst_pdim | connection_type | Action
+    0    |     0    |     OneToOne    | Direct Connection
+    1    |     1    |     OneToOne    | Direct Connection
+    0    |     0    |     OneToAll    | Direct Connection
+    1    |     0    |     OneToOne    | Slice -> Concate
+    0    |     1    |     OneToOne    | Slice -> Concate
+    1    |     0    |     OneToAll    | Slice -> Concate
+    0    |     1    |     OneToAll    | Split -> Concate
+    1    |     1    |     OneToAll    | Split -> Concate
+
+Logic:
+dst_pdim = 1 && OneToAll ?
+  (YES) Split -> Concate
+  (NO)  src_pdim = dst_pdim ?
+          (YES) Direct Connection
+          (NO)  Slice -> Concate
+```

Modified: incubator/singa/site/trunk/content/site.xml
URL: 
http://svn.apache.org/viewvc/incubator/singa/site/trunk/content/site.xml?rev=1723842&r1=1723841&r2=1723842&view=diff
==============================================================================
--- incubator/singa/site/trunk/content/site.xml (original)
+++ incubator/singa/site/trunk/content/site.xml Sat Jan  9 10:21:49 2016
@@ -79,10 +79,11 @@
           <item name="Updater" href="docs/updater.html"/>
         </item>
         <item name="Distributed Training" 
href="docs/distributed-training.html" collapse="true" >
-         <item name="Training on Mesos" href="docs/mesos.html"/>
           <item name="System Architecture" href="docs/architecture.html"/>
           <item name="Frameworks" href="docs/frameworks.html"/>
           <item name="Communication" href="docs/communication.html"/>
+          <item name="Hybrid Parallelism" href="docs/hybrid.html"/>
+               <item name="Training on Mesos" href="docs/mesos.html"/>
           <item name="Using HDFS" href="docs/hdfs.html"/>
         </item>
         <item name="Data Preparation" href="docs/data.html"/>


Reply via email to