Author: dinhtta
Date: Thu Dec 31 06:08:31 2015
New Revision: 1722420

URL: http://svn.apache.org/viewvc?rev=1722420&view=rev
Log:
Added documentation on training with HDFS

Added:
    incubator/singa/site/trunk/content/markdown/docs/hdfs.md
Modified:
    incubator/singa/site/trunk/content/markdown/docs/distributed-training.md
    incubator/singa/site/trunk/content/site.xml

Modified: 
incubator/singa/site/trunk/content/markdown/docs/distributed-training.md
URL: 
http://svn.apache.org/viewvc/incubator/singa/site/trunk/content/markdown/docs/distributed-training.md?rev=1722420&r1=1722419&r2=1722420&view=diff
==============================================================================
--- incubator/singa/site/trunk/content/markdown/docs/distributed-training.md 
(original)
+++ incubator/singa/site/trunk/content/markdown/docs/distributed-training.md 
Thu Dec 31 06:08:31 2015
@@ -14,3 +14,4 @@ We also provide high-level descriptions
 
 * [System Communication](communication.html)
 
+* [Using HDFS](hdfs.html)

Added: incubator/singa/site/trunk/content/markdown/docs/hdfs.md
URL: 
http://svn.apache.org/viewvc/incubator/singa/site/trunk/content/markdown/docs/hdfs.md?rev=1722420&view=auto
==============================================================================
--- incubator/singa/site/trunk/content/markdown/docs/hdfs.md (added)
+++ incubator/singa/site/trunk/content/markdown/docs/hdfs.md Thu Dec 31 
06:08:31 2015
@@ -0,0 +1,128 @@
+# Using HDFS with SINGA
+
+This guide explains how to make use of HDFS as the data store for SINGA jobs. 
+
+1. [Quick start using Docker](#quickstart)
+2. [Setup HDFS](#hdfs)
+3. [Examples](#examples)    
+
+--
+<a name="quickstart"></a>
+## Quick start using Docker 
+
+We provide a Docker container built on top of `singa/mesos` (see the <a 
href="http://singa.incubator.apache.org/docs/docker.html";>guide on building 
SINGA on Docker</a>). 
+
+```
+git clone https://github.com/ug93tad/incubator-singa
+cd incubator-singa
+git checkout SINGA-97-docker
+cd tool/docker/hdfs
+sudo docker build -t singa/hdfs .
+```
+
+Once built, the container image `singa/hdfs` contains the installation of HDFS 
C++ library (`libhdfs3`) and the latest SINGA code. Many distributed nodes can 
be launched, and HDFS be set up, by following the <a 
href="http://singa.incubator.apache.org/docs/mesos.html";>guide for running 
distributed SINGA on Mesos</a>. 
+
+In the following, we assume the HDFS setup with `node0` being the namenode, 
and `nodei (i>0)` being the datanodes. 
+
+<a name="hdfs"></a>
+## Setup HDFS 
+There are at least 2 C/C++ client libraries for interacting with HDFS. One is 
from Hadoop (`libhdfs`), which is a <a 
href="https://wiki.apache.org/hadoop/LibHDFS";>JNI-based library</a>, meaning 
that communication will go through JVM. The other is `libhdfs3` which is a <a 
href="https://github.com/PivotalRD/libhdfs3";>native C++ library developed by 
Pivotal</a>, in which the client communicate directly with HDFS via RPC. The 
current implementation uses the second one. 
+
+1. Install `libhdfs3`: follow the <a 
href="https://github.com/PivotalRD/libhdfs3#installation";>official guide</a>. 
+
+2. **Additional setup**: recent versions of Hadoop (>2.4.x) support 
short-circuit local reads which bypass network communications (TCP sockets) 
when retrieving data at the local nodes. `libhdfs3` will throws errors (but 
will still continue to work) when it finds that short-circuit read is not set. 
To deal with this complaints, and improve performance, add the following 
configuration to `hdfs-site.xml` **and to `hdfs-client.xml`**
+  
+    ```
+  <property>
+    <name>dfs.client.read.shortcircuit</name>
+    <value>true</value>
+  </property>
+  <property>
+    <name>dfs.domain.socket.path</name>
+    <value>/var/lib/hadoop-hdfs/dn_socket</value>
+  </property>
+    ``` 
+    Next, at each client, set `LIBHDFS3_CONF` variable to point to 
`hdfs-client.xml` file:
+
+    ```
+  export LIBHDFS3_CONF=$HADOOP_HOME/etc/hadoop/hdfs-client.xml
+    ```
+
+<a name="examples"></a>
+## Examples
+We explain how to run CIFAR10 and MNIST examples. Before training, the data 
must be uploaded to HDFS. 
+
+### CIFAR10
+1. Upload the data to HDFS (done at any of the HDFS nodes)
+    * Change `job.conf` to use HDFS: in `examples/cifar10/job.conf`, set 
`backend` property to `hdfsfile`
+    * Create and upload data: 
+
+    ```
+    cd examples/cifar10
+    cp Makefile.example Makefile
+    make create
+    hadoop dfs -mkdir /examples/cifar10
+    hadoop dfs -copyFromLocal cifar-10-batches-bin /examples/cifar10/
+    ```
+    If successful, the files should be seen in HDFS via `hadoop dfs -ls 
/examples/cifar10`
+
+2. Training:
+    * Make sure `conf/singa.conf` has correct path to Zookeeper service: 
+
+    ```
+    zookeeper_host: "node0:2181"
+    ```
+
+    * Make sure `job.conf` has correct paths to the train and test datasets:
+
+    ```
+    // train layer
+    path: "hdfs://node0:9000/examples/cifar10/train_data.bin"
+    mean_file: "hdfs://node0:9000/examples/cifar10/image_mean.bin"
+    // test layer
+    path: "hdfs://node0:9000/examples/cifar10/test_data.bin"
+    mean_file: "hdfs://node0:9000/examples/cifar10/image_mean.bin"
+    ```
+
+    * Start training: execute the following command at every node
+
+    ```
+    ./singa -conf examples/cifar10/job.conf -singa_conf singa.conf -singa_job 0
+    ```
+
+### MNIST
+1. Upload the data to HDFS (done at any of the HDFS nodes)
+    * Change `job.conf` to use HDFS: in `examples/mnist/job.conf`, set 
`backend` property to `hdfsfile`
+    * Create and upload data:
+
+    ```
+    cd examples/mnist
+    cp Makefile.example Makefile
+    make create
+    make compile
+    ./create_data.bin train-images-idx3-ubyte train-labels-idx1-ubyte 
hdfs://node0:9000/examples/mnist/train_data.bin
+    ./create_data.bin t10k-images-idx3-ubyte t10k-labels-idx1-ubyte 
hdfs://node0:9000/examples/mnist/test_data.bin
+    ```
+    If successful, the files should be seen in HDFS via `hadoop dfs -ls 
/examples/mnist`
+
+2. Training:
+    * Make sure `conf/singa.conf` has correct path to Zookeeper service: 
+
+    ```
+    zookeeper_host: "node0:2181"
+    ```
+
+    * Make sure `job.conf` has correct paths to the train and test datasets:
+
+    ```
+    // train layer
+    path: "hdfs://node0:9000/examples/mnist/train_data.bin"
+    // test layer
+    path: "hdfs://node0:9000/examples/mnist/test_data.bin"
+    ```
+
+    * Start training: execute the following command at every node
+
+    ```
+    ./singa -conf examples/mnist/job.conf -singa_conf singa.conf -singa_job 0
+    ```

Modified: incubator/singa/site/trunk/content/site.xml
URL: 
http://svn.apache.org/viewvc/incubator/singa/site/trunk/content/site.xml?rev=1722420&r1=1722419&r2=1722420&view=diff
==============================================================================
--- incubator/singa/site/trunk/content/site.xml (original)
+++ incubator/singa/site/trunk/content/site.xml Thu Dec 31 06:08:31 2015
@@ -83,6 +83,7 @@
           <item name="System Architecture" href="docs/architecture.html"/>
           <item name="Frameworks" href="docs/frameworks.html"/>
           <item name="Communication" href="docs/communication.html"/>
+          <item name="Using HDFS" href="docs/hdfs.html"/>
         </item>
         <item name="Data Preparation" href="docs/data.html"/>
         <item name="Checkpoint" href="docs/checkpoint.html"/>


Reply via email to