Author: dinhtta
Date: Thu Dec 31 06:08:31 2015
New Revision: 1722420
URL: http://svn.apache.org/viewvc?rev=1722420&view=rev
Log:
Added documentation on training with HDFS
Added:
incubator/singa/site/trunk/content/markdown/docs/hdfs.md
Modified:
incubator/singa/site/trunk/content/markdown/docs/distributed-training.md
incubator/singa/site/trunk/content/site.xml
Modified:
incubator/singa/site/trunk/content/markdown/docs/distributed-training.md
URL:
http://svn.apache.org/viewvc/incubator/singa/site/trunk/content/markdown/docs/distributed-training.md?rev=1722420&r1=1722419&r2=1722420&view=diff
==============================================================================
--- incubator/singa/site/trunk/content/markdown/docs/distributed-training.md
(original)
+++ incubator/singa/site/trunk/content/markdown/docs/distributed-training.md
Thu Dec 31 06:08:31 2015
@@ -14,3 +14,4 @@ We also provide high-level descriptions
* [System Communication](communication.html)
+* [Using HDFS](hdfs.html)
Added: incubator/singa/site/trunk/content/markdown/docs/hdfs.md
URL:
http://svn.apache.org/viewvc/incubator/singa/site/trunk/content/markdown/docs/hdfs.md?rev=1722420&view=auto
==============================================================================
--- incubator/singa/site/trunk/content/markdown/docs/hdfs.md (added)
+++ incubator/singa/site/trunk/content/markdown/docs/hdfs.md Thu Dec 31
06:08:31 2015
@@ -0,0 +1,128 @@
+# Using HDFS with SINGA
+
+This guide explains how to make use of HDFS as the data store for SINGA jobs.
+
+1. [Quick start using Docker](#quickstart)
+2. [Setup HDFS](#hdfs)
+3. [Examples](#examples)
+
+--
+<a name="quickstart"></a>
+## Quick start using Docker
+
+We provide a Docker container built on top of `singa/mesos` (see the <a
href="http://singa.incubator.apache.org/docs/docker.html">guide on building
SINGA on Docker</a>).
+
+```
+git clone https://github.com/ug93tad/incubator-singa
+cd incubator-singa
+git checkout SINGA-97-docker
+cd tool/docker/hdfs
+sudo docker build -t singa/hdfs .
+```
+
+Once built, the container image `singa/hdfs` contains the installation of HDFS
C++ library (`libhdfs3`) and the latest SINGA code. Many distributed nodes can
be launched, and HDFS be set up, by following the <a
href="http://singa.incubator.apache.org/docs/mesos.html">guide for running
distributed SINGA on Mesos</a>.
+
+In the following, we assume the HDFS setup with `node0` being the namenode,
and `nodei (i>0)` being the datanodes.
+
+<a name="hdfs"></a>
+## Setup HDFS
+There are at least 2 C/C++ client libraries for interacting with HDFS. One is
from Hadoop (`libhdfs`), which is a <a
href="https://wiki.apache.org/hadoop/LibHDFS">JNI-based library</a>, meaning
that communication will go through JVM. The other is `libhdfs3` which is a <a
href="https://github.com/PivotalRD/libhdfs3">native C++ library developed by
Pivotal</a>, in which the client communicate directly with HDFS via RPC. The
current implementation uses the second one.
+
+1. Install `libhdfs3`: follow the <a
href="https://github.com/PivotalRD/libhdfs3#installation">official guide</a>.
+
+2. **Additional setup**: recent versions of Hadoop (>2.4.x) support
short-circuit local reads which bypass network communications (TCP sockets)
when retrieving data at the local nodes. `libhdfs3` will throws errors (but
will still continue to work) when it finds that short-circuit read is not set.
To deal with this complaints, and improve performance, add the following
configuration to `hdfs-site.xml` **and to `hdfs-client.xml`**
+
+ ```
+ <property>
+ <name>dfs.client.read.shortcircuit</name>
+ <value>true</value>
+ </property>
+ <property>
+ <name>dfs.domain.socket.path</name>
+ <value>/var/lib/hadoop-hdfs/dn_socket</value>
+ </property>
+ ```
+ Next, at each client, set `LIBHDFS3_CONF` variable to point to
`hdfs-client.xml` file:
+
+ ```
+ export LIBHDFS3_CONF=$HADOOP_HOME/etc/hadoop/hdfs-client.xml
+ ```
+
+<a name="examples"></a>
+## Examples
+We explain how to run CIFAR10 and MNIST examples. Before training, the data
must be uploaded to HDFS.
+
+### CIFAR10
+1. Upload the data to HDFS (done at any of the HDFS nodes)
+ * Change `job.conf` to use HDFS: in `examples/cifar10/job.conf`, set
`backend` property to `hdfsfile`
+ * Create and upload data:
+
+ ```
+ cd examples/cifar10
+ cp Makefile.example Makefile
+ make create
+ hadoop dfs -mkdir /examples/cifar10
+ hadoop dfs -copyFromLocal cifar-10-batches-bin /examples/cifar10/
+ ```
+ If successful, the files should be seen in HDFS via `hadoop dfs -ls
/examples/cifar10`
+
+2. Training:
+ * Make sure `conf/singa.conf` has correct path to Zookeeper service:
+
+ ```
+ zookeeper_host: "node0:2181"
+ ```
+
+ * Make sure `job.conf` has correct paths to the train and test datasets:
+
+ ```
+ // train layer
+ path: "hdfs://node0:9000/examples/cifar10/train_data.bin"
+ mean_file: "hdfs://node0:9000/examples/cifar10/image_mean.bin"
+ // test layer
+ path: "hdfs://node0:9000/examples/cifar10/test_data.bin"
+ mean_file: "hdfs://node0:9000/examples/cifar10/image_mean.bin"
+ ```
+
+ * Start training: execute the following command at every node
+
+ ```
+ ./singa -conf examples/cifar10/job.conf -singa_conf singa.conf -singa_job 0
+ ```
+
+### MNIST
+1. Upload the data to HDFS (done at any of the HDFS nodes)
+ * Change `job.conf` to use HDFS: in `examples/mnist/job.conf`, set
`backend` property to `hdfsfile`
+ * Create and upload data:
+
+ ```
+ cd examples/mnist
+ cp Makefile.example Makefile
+ make create
+ make compile
+ ./create_data.bin train-images-idx3-ubyte train-labels-idx1-ubyte
hdfs://node0:9000/examples/mnist/train_data.bin
+ ./create_data.bin t10k-images-idx3-ubyte t10k-labels-idx1-ubyte
hdfs://node0:9000/examples/mnist/test_data.bin
+ ```
+ If successful, the files should be seen in HDFS via `hadoop dfs -ls
/examples/mnist`
+
+2. Training:
+ * Make sure `conf/singa.conf` has correct path to Zookeeper service:
+
+ ```
+ zookeeper_host: "node0:2181"
+ ```
+
+ * Make sure `job.conf` has correct paths to the train and test datasets:
+
+ ```
+ // train layer
+ path: "hdfs://node0:9000/examples/mnist/train_data.bin"
+ // test layer
+ path: "hdfs://node0:9000/examples/mnist/test_data.bin"
+ ```
+
+ * Start training: execute the following command at every node
+
+ ```
+ ./singa -conf examples/mnist/job.conf -singa_conf singa.conf -singa_job 0
+ ```
Modified: incubator/singa/site/trunk/content/site.xml
URL:
http://svn.apache.org/viewvc/incubator/singa/site/trunk/content/site.xml?rev=1722420&r1=1722419&r2=1722420&view=diff
==============================================================================
--- incubator/singa/site/trunk/content/site.xml (original)
+++ incubator/singa/site/trunk/content/site.xml Thu Dec 31 06:08:31 2015
@@ -83,6 +83,7 @@
<item name="System Architecture" href="docs/architecture.html"/>
<item name="Frameworks" href="docs/frameworks.html"/>
<item name="Communication" href="docs/communication.html"/>
+ <item name="Using HDFS" href="docs/hdfs.html"/>
</item>
<item name="Data Preparation" href="docs/data.html"/>
<item name="Checkpoint" href="docs/checkpoint.html"/>