Added: incubator/singa/site/trunk/content/markdown/v0.3.0/kr/programming-guide.md URL: http://svn.apache.org/viewvc/incubator/singa/site/trunk/content/markdown/v0.3.0/kr/programming-guide.md?rev=1740048&view=auto ============================================================================== --- incubator/singa/site/trunk/content/markdown/v0.3.0/kr/programming-guide.md (added) +++ incubator/singa/site/trunk/content/markdown/v0.3.0/kr/programming-guide.md Wed Apr 20 05:09:06 2016 @@ -0,0 +1,77 @@ +# íë¡ê·¸ëë° ê°ì´ë + +--- + +Figure 1ì ê·¸ë ¤ì§ ë¤ìê³¼ ê°ì 4ê°ì§ Components 를 ì¤ì íì¬ í¸ë ì´ëì ììí©ëë¤. + + * [NeuralNet](neural-net.html) : ë´ë´ë¤í¸ìí¬ì 구조ì ê° "ë ì´ì´"ì ì¤ì ì 기ì í©ëë¤. + * [TrainOneBatch](train-one-batch.html) : ëª¨ë¸ ì¹´í ê³ ë¦¬ì ì í©í ìê³ ë¦¬ì¦ì 기ì í©ëë¤. + * [Updater](updater.html) : serverìì ë§¤ê° ë³ì를 ì ë°ì´í¸íë ë°©ë²ì 기ì í©ëë¤. + * [Cluster Topology](distributed-training.html) : workersì servers ë¶ì° í í´ë¡ì§ë¥¼ 기ì í©ëë¤. + +*Basic ì ì ê°ì´ë* ìì built-in components 를 ì¨ì í¸ë ì´ëì ììíë ë°©ë²ì ì¤ëª í©ëë¤. *Advanced ì ì ê°ì´ë* ììë ì ì ê° ìí리ë©í¸í 모ë¸, í¨ì, ìê³ ë¦¬ë¬ì ì¨ì í¸ë ì´ëì ììíë ë°©ë²ì ì¤ë³í©ëë¤. í¸ë ì´ë ë°ì´íë [process](data.html) 를 ì°¸ê³ ë¡ ì¤ë¹ë¥¼ í´ì£¼ì¸ì. + +<img src="../../images/overview.png" align="center" width="400px"/> +<span><strong>Figure 1 - SINGA Overview </strong></span> + + + +## Basic ì ì ê°ì´ë + +SINGA ìì ì¤ë¹ë main í¨ì를 ì¨ì ì½ê² í¸ë ì´ëì ììí ì ììµëë¤. +ì´ ê²½ì° [JobProto](../api/classsinga_1_1JobProto.html) 를 ìíì¬ google protocol buffer message ë¡ ìì¬ì§ job configuration íì¼ì ì¤ë¹í©ëë¤. ê·¸ë¦¬ê³ ìëì 커맨ëë¼ì¸ì ì¤íí©ëë¤. + + ./bin/singa-run.sh -conf <path to job conf> [-resume] + +`-resume` ë ì ë² [checkpoint](checkpoint.html) ë¶í° ë¤ì í¸ë ì´ëì ê³ìí ë ì°ë ì¸ì ì ëë¤. +[MLP](mlp.html) ì [CNN](cnn.html) ìíë¤ì built-in ì»´í¬ëí¸ë¥¼ ì´ì©íê³ ììµëë¤. +Please read the corresponding pages for their job configuration files. The subsequent pages will illustrate the details on each component of the configuration. + +## Advanced ì ì ê°ì´ë + +If a user's model contains some user-defined components, e.g., +[Updater](updater.html), he has to write a main function to +register these components. It is similar to Hadoop's main function. Generally, +the main function should + +* SINGA ì´ê¸°í, e.g., setup logging. + +* ì ì ì»´í¬ëí¸ì ë±ë¡ + +* job configuration ì ìì±íê³ SINGA driver ìì ì¤ì + +main í¨ìì ìíì ëë¤. + + #include "singa.h" + #include "user.h" // header for user code + + int main(int argc, char** argv) { + singa::Driver driver; + driver.Init(argc, argv); + bool resume; + // parse resume option from argv. + + // register user defined layers + driver.RegisterLayer<FooLayer>(kFooLayer); + // register user defined updater + driver.RegisterUpdater<FooUpdater>(kFooUpdater); + ... + auto jobConf = driver.job_conf(); + // update jobConf + + driver.Train(resume, jobConf); + return 0; + } + +Driver class' `Init` method ë 커맨ëë¼ì¸ ì¸ì `-conf <job conf>` ìì 주ì´ì§ job configuration íì¼ì ì½ìµëë¤. ê·¸ íì¼ìë cluster topology ì ë³´ê° ê¸°ì ëì´ìê³ , ì ì ê° neural net, updater ë±ì ì ë°ì´í¸ í¹ì ì¤ì í기ìí `jobConf`를 리í´í©ëë¤. +ì ì ê° Layer, Updater, Worker, Param ë±ì subclass를 ì ìíë©´, driver ì ë±ë¡ì í´ì¼í©ëë¤. +í¸ë ì´ëì ììí기 ìíì¬ job configuration ì¦ `jobConf`를 driver.Train ì ë겨ì¤ëë¤. + +<!--We will provide helper functions to make the configuration easier in the +future, like [keras](https://github.com/fchollet/keras).--> + +ì ì ì½ë를 compile íê³ SINGA library (*.libs/libsinga.so*) ì ë§í¬ìì¼ ì¤ííì¼, e.g., *mysinga*, ì ìì±í©ëë¤. íë¡ê·¸ë¨ì ë¤ìê³¼ ê°ì´ ì¤íí©ëë¤. + + ./bin/singa-run.sh -conf <path to job conf> -exec <path to mysinga> [other arguments] + +[RNN application](rnn.html) ìì RNN 모ë¸ì í¸ë ì´ëì ìí í¨ìì íë¡ê·¸ë¨ ì를 ì¤ëª í©ëë¤.
Added: incubator/singa/site/trunk/content/markdown/v0.3.0/kr/quick-start.md URL: http://svn.apache.org/viewvc/incubator/singa/site/trunk/content/markdown/v0.3.0/kr/quick-start.md?rev=1740048&view=auto ============================================================================== --- incubator/singa/site/trunk/content/markdown/v0.3.0/kr/quick-start.md (added) +++ incubator/singa/site/trunk/content/markdown/v0.3.0/kr/quick-start.md Wed Apr 20 05:09:06 2016 @@ -0,0 +1,176 @@ +# íµ ì¤íí¸ + +--- + +## SINGA ì¸ì¤í¨ + +SINGA ì¸ì¤í¨ì [ì¬ê¸°](installation.html)를 참조íììì¤. + +### Zookeeper ì¤í + +SINGA í¸ë ì´ëì [zookeeper](https://zookeeper.apache.org/)를 ì´ì©í©ëë¤. ì°ì zookeeper ìë¹ì¤ê° ììëì´ ìëì§ íì¸íììì¤. + +ì¤ë¹ë thirdparty ì¤í¬ë¦½í¸ë¥¼ ì¬ì©íì¬ zookeeper를 ì¤ì¹ í ê²½ì° ë¤ì ì¤í¬ë¦½í¸ë¥¼ ì¤ííììì¤. + + #goto top level folder + cd SINGA_ROOT + ./bin/zk-service.sh start + +(`./bin/zk-service.sh stop` // zookeeper ì¤ì§). + +기본 í¬í¸ë¥¼ ì¬ì©íì§ ìê³ zookeeper를 ìììí¬ ëë `conf/singa.conf`ì í¸ì§íììì¤. + + zookeeper_host : "localhost : YOUR_PORT" + +## Stand-alone 모ëìì ì¤í + +Stand-alone 모ëìì SINGAì ì¤íí ë, [Mesos](http://mesos.apache.org/) ì [YARN](http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/YARN.html) ê³¼ ê°ì í´ë¬ì¤í° ê´ë¦¬í´ì ì´ì©íì§ ìë ê²½ì°ë¥¼ ë§í©ëë¤. + +### Single ë ¸ëììì í¸ë ì´ë + +íëì íë¡ì¸ì¤ê° ììë©ëë¤. +ì를 ë¤ì´, +[CIFAR-10](http://www.cs.toronto.edu/~kriz/cifar.html) ë°ì´í° ì¸í¸ë¥¼ ì´ì©íì¬ +[CNN 모ë¸](http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks)ì í¸ë ì´ë ìíµëë¤. +íì´í¼ íë¼ë¯¸í°ë [cuda-convnet](https://code.google.com/p/cuda-convnet/)ì ë°ë¼ ì¤ì ëì´ ììµëë¤. +ìì¸í ë´ì©ì [CNN ìí](cnn.html) íì´ì§ë¥¼ 참조íììì¤. + + +#### ë°ì´í°ì ìì ì¤ì + +ë°ì´í° ì¸í¸ ë¤ì´ë¡ëì Triaing ì´ë Test 를 ìí ë°ì´í° ì¤ëì ìì±ì ë¤ìê³¼ ê°ì´ ì¤ìí©ëë¤. + + cd examples/cifar10/ + cp Makefile.example Makefile + make download + make create + +Training ê³¼ Test ë°ì´í° ì¸í¸ë ê°ê° *cifar10-train-shard* +ê·¸ë¦¬ê³ *cifar10-test-shard* í´ëì ë§ë¤ì´ì§ëë¤. 모ë ì´ë¯¸ì§ì í¹ì§ íê· ì 기ì í *image_mean.bin* íì¼ë í¨ê» ìì±ë©ëë¤. + +CNN ëª¨ë¸ í¸ë ì´ëì íìí ìì¤ì½ëë 모ë SINGAì í¬í¨ëì´ ììµëë¤. ì½ë를 ì¶ê° í íìë ììµëë¤. +ìì ì¤ì íì¼(*job.conf*) ì ì§ì íì¬ ì¤í¬ë¦½í¸(*../../bin/singa-run.sh*)를 ì¤íí©ëë¤. +SINGA ì½ë를 ë³ê²½íê±°ë ì¶ê° í ê²½ì°ë, íë¡ê·¸ëë°ê°ì´ë (programming-guide.html)를 참조íììì¤. + +#### ë³ë ¬í ìì´ í¸ë ì´ë + +Cluster Topologyì 기본ê°ì íëì workerì íëì serverê° ììµëë¤. +ë°ì´í°ì 모ë¸ì ë³ë ¬ ì²ë¦¬ë ëì§ ììµëë¤. + +í¸ë ì´ëì ììí기 ìíì¬ ë¤ì ì¤í¬ë¦½í¸ë¥¼ ì¤íí©ëë¤. + + # goto top level folder + cd ../../ + ./bin/singa-run.sh -conf examples/cifar10/job.conf + + +íì¬ ì¤íì¤ì¸ ìì ì 리ì¤í¸ë¥¼ ë³´ë ¤ë©´ + + ./bin/singa-console.sh list + + JOB ID | NUM PROCS + ---------- | ----------- + 24 | 1 + +ìì ì ì¢ ë£íë ¤ë©´ + + ./bin/singa-console.sh kill JOB_ID + + +ë¡ê·¸ ë° ìì ì ë³´ë */tmp/singa-log* í´ëì ì ì¥ë©ëë¤. +*conf/singa.conf* íì¼ì `log-dir`ìì ë³ê²½ ê°ë¥í©ëë¤. + + +#### ë¹ë기 ë³ë ¬ í¸ë ì´ë + + # job.conf + ... + cluster { + nworker_groups : 2 + nworkers_per_procs : 2 + workspace : "examples/cifar10/" + } + +ì¬ë¬ worker 그룹ì ì¤íí¨ì¼ë¡ì¨ [ë¹ë기 í¸ë ì´ë](architecture.html)ì í ì ììµëë¤. +ì를 ë¤ì´, *job.conf* ì ìì ê°ì´ ë³ê²½í©ëë¤. +기본ì ì¼ë¡ íëì worker ê·¸ë£¹ì´ íëì worker를 ê°ëë¡ ì¤ì ëì´ ììµëë¤. +ìì ì¤ì ì íëì íë¡ì¸ì¤ì 2ê°ì workerê° ì¤ì ëì´ ì기 ë문ì 2ê°ì worker ê·¸ë£¹ì´ ëì¼í íë¡ì¸ì¤ë¡ ì¤íë©ëë¤. +ê²°ê³¼ ì¸ë©ëª¨ë¦¬ [Downpour](frameworks.html) í¸ë ì´ë íë ììí¬ë¡ ì¤íë©ëë¤. + +ì¬ì©ìë ë°ì´í°ì ë¶ì°ì ì ê²½ ì¸ íìë ììµëë¤. +ëë¤ ì¤íì ì ë°ë¼ ê° worker 그룹ì ë°ì´í°ê° ë³´ë´ì§ëë¤. +ê° workerë ë¤ë¥¸ ë°ì´í° íí°ì ì ë´ë¹í©ëë¤. + + # job.conf + ... + neuralnet { + layer { + ... + sharddata_conf { + random_skip : 5000 + } + } + ... + } + +ì¤í¬ë¦½í¸ ì¤í : + + ./bin/singa-run.sh -conf examples/cifar10/job.conf + +#### ë기í ë³ë ¬ í¸ë ì´ë + + # job.conf + ... + cluster { + nworkers_per_group : 2 + nworkers_per_procs : 2 + workspace : "examples/cifar10/" + } + +íëì worker 그룹ì¼ë¡ ì¬ë¬ worker를 ì¤ííì¬ [ë기 í¸ë ì´ë](architecture.html)ì ìí í ì ììµëë¤. +ì를 ë¤ì´, *job.conf* íì¼ì ìì ê°ì´ ë³ê²½í©ëë¤. +ìì ì¤ì ì íëì worker 그룹ì 2ê°ì workerê° ì¤ì ëììµëë¤. +worker ë¤ì 그룹 ë´ìì ë기íí©ëë¤. +ì´ê²ì ì¸ë©ëª¨ë¦¬ [sandblaster](frameworks.html)ë¡ ì¤íë©ëë¤. +모ë¸ì 2ê°ì workerë¡ ë¶í ë©ëë¤. ê° ë ì´ì´ê° 2ê°ì workerë¡ ë¶ì°ë©ëë¤. +ë°°ë¶ ë ë ì´ì´ë ì본 ë ì´ì´ì 기ë¥ì ê°ì§ë§ í¹ì§ ì¸ì¤í´ì¤ì ìê° `B / g` ë¡ ë©ëë¤. +ì¬ê¸°ì `B`ë 미ëë°§ì¹ ì¸ì¤í´ì¤ì ì«ìë¡ `g`ë 그룹ì worker ì ì ëë¤. +[ë¤ë¥¸ ì¤í´](neural-net.html)ì ì´ì©í ë ì´ì´ (ë´ë´ë¤í¸ìí¬) íí°ì ë°©ë²ë ììµëë¤. + +ë¤ë¥¸ ì¤ì ë¤ì 모ë "ë³ë ¬í ìì"ì ê²½ì°ì ëì¼í©ëë¤. + + ./bin/singa-run.sh -conf examples/cifar10/job.conf + +### í´ë¬ì¤í°ììì í¸ë ì´ë + +í´ë¬ì¤í° ì¤ì ì ë³ê²½íì¬ ì í¸ë ì´ë íë ììí¬ë¥¼ íì¥í©ëë¤. + + nworker_per_procs : 1 + +모ë íë¡ì¸ì¤ë íëì worker ì¤ë ë를 ìì±í©ëë¤. +ê²°ê³¼ worker ì°ë¦¬ë ë¤ë¥¸ íë¡ì¸ì¤ (ë ¸ë)ìì ìì±ë©ëë¤. +í´ë¬ì¤í°ì ë ¸ë를 í¹ì íë ¤ë©´ *SINGA_ROOT/conf/* ì *hostfile* ì ì¤ââì ì´ íìí©ëë¤. + +e.g., + + logbase-a01 + logbase-a02 + +zookeeper locationë ì¤ì í´ì¼í©ëë¤. + +e.g., + + # conf/singa.conf + zookeeper_host : "logbase-a01" + +ì¤í¬ë¦½í¸ì ì¤íì "Single ë ¸ë í¸ë ì´ë"ê³¼ ëì¼í©ëë¤. + + ./bin/singa-run.sh -conf examples/cifar10/job.conf + +## Mesosìì ì¤í + +*working* ... + +## ë¤ì + +SINGA ì ì½ë ë³ê²½ ë° ì¶ê°ì ëí ìì¸í ë´ì©ì [íë¡ê·¸ëë° ê°ì´ë](programming-guide.html)를 참조íììì¤. Added: incubator/singa/site/trunk/content/markdown/v0.3.0/kr/rbm.md URL: http://svn.apache.org/viewvc/incubator/singa/site/trunk/content/markdown/v0.3.0/kr/rbm.md?rev=1740048&view=auto ============================================================================== --- incubator/singa/site/trunk/content/markdown/v0.3.0/kr/rbm.md (added) +++ incubator/singa/site/trunk/content/markdown/v0.3.0/kr/rbm.md Wed Apr 20 05:09:06 2016 @@ -0,0 +1,365 @@ +# RBM Example + +--- + +This example uses SINGA to train 4 RBM models and one auto-encoder model over the +[MNIST dataset](http://yann.lecun.com/exdb/mnist/). The auto-encoder model is trained +to reduce the dimensionality of the MNIST image feature. The RBM models are trained +to initialize parameters of the auto-encoder model. This example application is +from [Hinton's science paper](http://www.cs.toronto.edu/~hinton/science.pdf). + +## Running instructions + +Running scripts are provided in *SINGA_ROOT/examples/rbm* folder. + +The MNIST dataset has 70,000 handwritten digit images. The +[data preparation](data.html) page +has details on converting this dataset into SINGA recognizable format. Users can +simply run the following commands to download and convert the dataset. + + # at SINGA_ROOT/examples/mnist/ + $ cp Makefile.example Makefile + $ make download + $ make create + +The training is separated into two phases, namely pre-training and fine-tuning. +The pre-training phase trains 4 RBMs in sequence, + + # at SINGA_ROOT/ + $ ./bin/singa-run.sh -conf examples/rbm/rbm1.conf + $ ./bin/singa-run.sh -conf examples/rbm/rbm2.conf + $ ./bin/singa-run.sh -conf examples/rbm/rbm3.conf + $ ./bin/singa-run.sh -conf examples/rbm/rbm4.conf + +The fine-tuning phase trains the auto-encoder by, + + $ ./bin/singa-run.sh -conf examples/rbm/autoencoder.conf + + +## Training details + +### RBM1 + +<img src="../images/example-rbm1.png" align="center" width="200px"/> +<span><strong>Figure 1 - RBM1.</strong></span> + +The neural net structure for training RBM1 is shown in Figure 1. +The data layer and parser layer provides features for training RBM1. +The visible layer (connected with parser layer) of RBM1 accepts the image feature +(784 dimension). The hidden layer is set to have 1000 neurons (units). +These two layers are configured as, + + layer{ + name: "RBMVis" + type: kRBMVis + srclayers:"mnist" + srclayers:"RBMHid" + rbm_conf{ + hdim: 1000 + } + param{ + name: "w1" + init{ + type: kGaussian + mean: 0.0 + std: 0.1 + } + } + param{ + name: "b11" + init{ + type: kConstant + value: 0.0 + } + } + } + + layer{ + name: "RBMHid" + type: kRBMHid + srclayers:"RBMVis" + rbm_conf{ + hdim: 1000 + } + param{ + name: "w1_" + share_from: "w1" + } + param{ + name: "b12" + init{ + type: kConstant + value: 0.0 + } + } + } + + + +For RBM, the weight matrix is shared by the visible and hidden layers. For instance, +`w1` is shared by `vis` and `hid` layers shown in Figure 1. In SINGA, we can configure +the `share_from` field to enable [parameter sharing](param.html) +as shown above for the param `w1` and `w1_`. + +[Contrastive Divergence](train-one-batch.html#contrastive-divergence) +is configured as the algorithm for [TrainOneBatch](train-one-batch.html). +Following Hinton's paper, we configure the [updating protocol](updater.html) +as follows, + + # Updater Configuration + updater{ + type: kSGD + momentum: 0.2 + weight_decay: 0.0002 + learning_rate{ + base_lr: 0.1 + type: kFixed + } + } + +Since the parameters of RBM0 will be used to initialize the auto-encoder, we should +configure the `workspace` field to specify a path for the checkpoint folder. +For example, if we configure it as, + + cluster { + workspace: "examples/rbm/rbm1/" + } + +Then SINGA will [checkpoint the parameters](checkpoint.html) into *examples/rbm/rbm1/*. + +### RBM1 +<img src="../images/example-rbm2.png" align="center" width="200px"/> +<span><strong>Figure 2 - RBM2.</strong></span> + +Figure 2 shows the net structure of training RBM2. +The visible units of RBM2 accept the output from the Sigmoid1 layer. The Inner1 layer +is a `InnerProductLayer` whose parameters are set to the `w1` and `b12` learned +from RBM1. +The neural net configuration is (with layers for data layer and parser layer omitted). + + layer{ + name: "Inner1" + type: kInnerProduct + srclayers:"mnist" + innerproduct_conf{ + num_output: 1000 + } + param{ name: "w1" } + param{ name: "b12"} + } + + layer{ + name: "Sigmoid1" + type: kSigmoid + srclayers:"Inner1" + } + + layer{ + name: "RBMVis" + type: kRBMVis + srclayers:"Sigmoid1" + srclayers:"RBMHid" + rbm_conf{ + hdim: 500 + } + param{ + name: "w2" + ... + } + param{ + name: "b21" + ... + } + } + + layer{ + name: "RBMHid" + type: kRBMHid + srclayers:"RBMVis" + rbm_conf{ + hdim: 500 + } + param{ + name: "w2_" + share_from: "w2" + } + param{ + name: "b22" + ... + } + } + +To load w0 and b02 from RBM0's checkpoint file, we configure the `checkpoint_path` as, + + checkpoint_path: "examples/rbm/rbm1/checkpoint/step6000-worker0" + cluster{ + workspace: "examples/rbm/rbm2" + } + +The workspace is changed for checkpointing `w2`, `b21` and `b22` into +*examples/rbm/rbm2/*. + +### RBM3 + +<img src="../images/example-rbm3.png" align="center" width="200px"/> +<span><strong>Figure 3 - RBM3.</strong></span> + +Figure 3 shows the net structure of training RBM3. In this model, a layer with +250 units is added as the hidden layer of RBM3. The visible units of RBM3 +accepts output from Sigmoid2 layer. Parameters of Inner1 and Innner2 are set to +`w1,b12,w2,b22` which can be load from the checkpoint file of RBM2, +i.e., "examples/rbm/rbm2/". + +### RBM4 + + +<img src="../images/example-rbm4.png" align="center" width="200px"/> +<span><strong>Figure 4 - RBM4.</strong></span> + +Figure 4 shows the net structure of training RBM4. It is similar to Figure 3, +but according to [Hinton's science paper](http://www.cs.toronto.edu/~hinton/science.pdf), the hidden units of the +top RBM (RBM4) have stochastic real-valued states drawn from a unit variance +Gaussian whose mean is determined by the input from the RBM's logistic visible +units. So we add a `gaussian` field in the RBMHid layer to control the +sampling distribution (Gaussian or Bernoulli). In addition, this +RBM has a much smaller learning rate (0.001). The neural net configuration for +the RBM4 and the updating protocol is (with layers for data layer and parser +layer omitted), + + # Updater Configuration + updater{ + type: kSGD + momentum: 0.9 + weight_decay: 0.0002 + learning_rate{ + base_lr: 0.001 + type: kFixed + } + } + + layer{ + name: "RBMVis" + type: kRBMVis + srclayers:"Sigmoid3" + srclayers:"RBMHid" + rbm_conf{ + hdim: 30 + } + param{ + name: "w4" + ... + } + param{ + name: "b41" + ... + } + } + + layer{ + name: "RBMHid" + type: kRBMHid + srclayers:"RBMVis" + rbm_conf{ + hdim: 30 + gaussian: true + } + param{ + name: "w4_" + share_from: "w4" + } + param{ + name: "b42" + ... + } + } + +### Auto-encoder +In the fine-tuning stage, the 4 RBMs are "unfolded" to form encoder and decoder +networks that are initialized using the parameters from the previous 4 RBMs. + +<img src="../images/example-autoencoder.png" align="center" width="500px"/> +<span><strong>Figure 5 - Auto-Encoders.</strong></span> + + +Figure 5 shows the neural net structure for training the auto-encoder. +[Back propagation (kBP)] (train-one-batch.html) is +configured as the algorithm for `TrainOneBatch`. We use the same cluster +configuration as RBM models. For updater, we use [AdaGrad](updater.html#adagradupdater) algorithm with +fixed learning rate. + + ### Updater Configuration + updater{ + type: kAdaGrad + learning_rate{ + base_lr: 0.01 + type: kFixed + } + } + + + +According to [Hinton's science paper](http://www.cs.toronto.edu/~hinton/science.pdf), +we configure a EuclideanLoss layer to compute the reconstruction error. The neural net +configuration is (with some of the middle layers omitted), + + layer{ name: "data" } + layer{ name:"mnist" } + layer{ + name: "Inner1" + param{ name: "w1" } + param{ name: "b12" } + } + layer{ name: "Sigmoid1" } + ... + layer{ + name: "Inner8" + innerproduct_conf{ + num_output: 784 + transpose: true + } + param{ + name: "w8" + share_from: "w1" + } + param{ name: "b11" } + } + layer{ name: "Sigmoid8" } + + # Euclidean Loss Layer Configuration + layer{ + name: "loss" + type:kEuclideanLoss + srclayers:"Sigmoid8" + srclayers:"mnist" + } + +To load pre-trained parameters from the 4 RBMs' checkpoint file we configure `checkpoint_path` as + + ### Checkpoint Configuration + checkpoint_path: "examples/rbm/checkpoint/rbm1/checkpoint/step6000-worker0" + checkpoint_path: "examples/rbm/checkpoint/rbm2/checkpoint/step6000-worker0" + checkpoint_path: "examples/rbm/checkpoint/rbm3/checkpoint/step6000-worker0" + checkpoint_path: "examples/rbm/checkpoint/rbm4/checkpoint/step6000-worker0" + + +## Visualization Results + +<div> +<img src="../images/rbm-weight.PNG" align="center" width="300px"/> + +<img src="../images/rbm-feature.PNG" align="center" width="300px"/> +<br/> +<span><strong>Figure 6 - Bottom RBM weight matrix.</strong></span> + + + + + +<span><strong>Figure 7 - Top layer features.</strong></span> +</div> + +Figure 6 visualizes sample columns of the weight matrix of RBM1, We can see the +Gabor-like filters are learned. Figure 7 depicts the features extracted from +the top-layer of the auto-encoder, wherein one point represents one image. +Different colors represent different digits. We can see that most images are +well clustered according to the ground truth. Added: incubator/singa/site/trunk/content/markdown/v0.3.0/kr/rnn.md URL: http://svn.apache.org/viewvc/incubator/singa/site/trunk/content/markdown/v0.3.0/kr/rnn.md?rev=1740048&view=auto ============================================================================== --- incubator/singa/site/trunk/content/markdown/v0.3.0/kr/rnn.md (added) +++ incubator/singa/site/trunk/content/markdown/v0.3.0/kr/rnn.md Wed Apr 20 05:09:06 2016 @@ -0,0 +1,420 @@ +# Recurrent Neural Networks for Language Modelling + +--- + +Recurrent Neural Networks (RNN) are widely used for modelling sequential data, +such as music and sentences. In this example, we use SINGA to train a +[RNN model](http://www.fit.vutbr.cz/research/groups/speech/publi/2010/mikolov_interspeech2010_IS100722.pdf) +proposed by Tomas Mikolov for [language modeling](https://en.wikipedia.org/wiki/Language_model). +The training objective (loss) is +to minimize the [perplexity per word](https://en.wikipedia.org/wiki/Perplexity), which +is equivalent to maximize the probability of predicting the next word given the current word in +a sentence. + +Different to the [CNN](cnn.html), [MLP](mlp.html) +and [RBM](rbm.html) examples which use built-in +layers(layer) and records(data), +none of the layers in this example are built-in. Hence users would learn to +implement their own layers and data records through this example. + +## Running instructions + +In *SINGA_ROOT/examples/rnnlm/*, scripts are provided to run the training job. +First, the data is prepared by + + $ cp Makefile.example Makefile + $ make download + $ make create + +Second, to compile the source code under *examples/rnnlm/*, run + + $ make rnnlm + +An executable file *rnnlm.bin* will be generated. + +Third, the training is started by passing *rnnlm.bin* and the job configuration +to *singa-run.sh*, + + # at SINGA_ROOT/ + # export LD_LIBRARY_PATH=.libs:$LD_LIBRARY_PATH + $ ./bin/singa-run.sh -exec examples/rnnlm/rnnlm.bin -conf examples/rnnlm/job.conf + +## Implementations + +<img src="../images/rnnlm.png" align="center" width="400px"/> +<span><strong>Figure 1 - Net structure of the RNN model.</strong></span> + +The neural net structure is shown Figure 1. Word records are loaded by +`DataLayer`. For every iteration, at most `max_window` word records are +processed. If a sentence ending character is read, the `DataLayer` stops +loading immediately. `EmbeddingLayer` looks up a word embedding matrix to extract +feature vectors for words loaded by the `DataLayer`. These features are transformed by the +`HiddenLayer` which propagates the features from left to right. The +output feature for word at position k is influenced by words from position 0 to +k-1. Finally, `LossLayer` computes the cross-entropy loss (see below) +by predicting the next word of each word. +The cross-entropy loss is computed as + +`$$L(w_t)=-log P(w_{t+1}|w_t)$$` + +Given `$w_t$` the above equation would compute over all words in the vocabulary, +which is time consuming. +[RNNLM Toolkit](https://f25ea9ccb7d3346ce6891573d543960492b92c30.googledrive.com/host/0ByxdPXuxLPS5RFM5dVNvWVhTd0U/rnnlm-0.4b.tgz) +accelerates the computation as + +`$$P(w_{t+1}|w_t) = P(C_{w_{t+1}}|w_t) * P(w_{t+1}|C_{w_{t+1}})$$` + +Words from the vocabulary are partitioned into a user-defined number of classes. +The first term on the left side predicts the class of the next word, and +then predicts the next word given its class. Both the number of classes and +the words from one class are much smaller than the vocabulary size. The probabilities +can be calculated much faster. + +The perplexity per word is computed by, + +`$$PPL = 10^{- avg_t log_{10} P(w_{t+1}|w_t)}$$` + +### Data preparation + +We use a small dataset provided by the [RNNLM Toolkit](https://f25ea9ccb7d3346ce6891573d543960492b92c30.googledrive.com/host/0ByxdPXuxLPS5RFM5dVNvWVhTd0U/rnnlm-0.4b.tgz). +It has 10,000 training sentences, with 71350 words in total and 3720 unique words. +The subsequent steps follow the instructions in +[Data Preparation](data.html) to convert the +raw data into records and insert them into data stores. + +#### Download source data + + # in SINGA_ROOT/examples/rnnlm/ + cp Makefile.example Makefile + make download + +#### Define record format + +We define the word record as follows, + + # in SINGA_ROOT/examples/rnnlm/rnnlm.proto + message WordRecord { + optional string word = 1; + optional int32 word_index = 2; + optional int32 class_index = 3; + optional int32 class_start = 4; + optional int32 class_end = 5; + } + +It includes the word string and its index in the vocabulary. +Words in the vocabulary are sorted based on their frequency in the training dataset. +The sorted list is cut into 100 sublists such that each sublist has 1/100 total +word frequency. Each sublist is called a class. +Hence each word has a `class_index` ([0,100)). The `class_start` is the index +of the first word in the same class as `word`. The `class_end` is the index of +the first word in the next class. + +#### Create data stores + +We use code from RNNLM Toolkit to read words, and sort them into classes. +The main function in *create_store.cc* first creates word classes based on the training +dataset. Second it calls the following function to create data store for the +training, validation and test dataset. + + int create_data(const char *input_file, const char *output_file); + +`input` is the path to training/validation/testing text file from the RNNLM Toolkit, `output` is output store file. +This function starts with + + singa::io::KVFile store; + store.Open(output, signa::io::kCreate); + +Then it reads the words one by one. For each word it creates a `WordRecord` instance, +and inserts it into the store, + + int wcnt = 0; // word count + WordRecord wordRecord; + while(1) { + readWord(wordstr, fin); + if (feof(fin)) break; + ...// fill in the wordRecord; + string val; + wordRecord.SerializeToString(&val); + int length = snprintf(key, BUFFER_LEN, "%05d", wcnt++); + store.Write(string(key, length), val); + } + +Compilation and running commands are provided in the *Makefile.example*. +After executing + + make create + +*train_data.bin*, *test_data.bin* and *valid_data.bin* will be created. + + +### Layer implementation + +4 user-defined layers are implemented for this application. +Following the guide for implementing [new Layer subclasses](layer#implementing-a-new-layer-subclass), +we extend the [LayerProto](../api/classsinga_1_1LayerProto.html) +to include the configuration messages of user-defined layers as shown below +(3 out of the 7 layers have specific configurations), + + + import "job.proto"; // Layer message for SINGA is defined + + //For implementation of RNNLM application + extend singa.LayerProto { + optional EmbeddingProto embedding_conf = 101; + optional LossProto loss_conf = 102; + optional DataProto data_conf = 103; + } + +In the subsequent sections, we describe the implementation of each layer, +including its configuration message. + +#### RNNLayer + +This is the base layer of all other layers for this applications. It is defined +as follows, + + class RNNLayer : virtual public Layer { + public: + inline int window() { return window_; } + protected: + int window_; + }; + +For this application, two iterations may process different number of words. +Because sentences have different lengths. +The `DataLayer` decides the effective window size. All other layers call its source layers to get the +effective window size and resets `window_` in `ComputeFeature` function. + +#### DataLayer + +DataLayer is for loading Records. + + class DataLayer : public RNNLayer, singa::InputLayer { + public: + void Setup(const LayerProto& proto, const vector<Layer*>& srclayers) override; + void ComputeFeature(int flag, const vector<Layer*>& srclayers) override; + int max_window() const { + return max_window_; + } + private: + int max_window_; + singa::io::Store* store_; + }; + +The Setup function gets the user configured max window size. + + max_window_ = proto.GetExtension(input_conf).max_window(); + +The `ComputeFeature` function loads at most max_window records. It could also +stop when the sentence ending character is encountered. + + ...// shift the last record to the first + window_ = max_window_; + for (int i = 1; i <= max_window_; i++) { + // load record; break if it is the ending character + } + +The configuration of `DataLayer` is like + + name: "data" + user_type: "kData" + [data_conf] { + path: "examples/rnnlm/train_data.bin" + max_window: 10 + } + +#### EmbeddingLayer + +This layer gets records from `DataLayer`. For each record, the word index is +parsed and used to get the corresponding word feature vector from the embedding +matrix. + +The class is declared as follows, + + class EmbeddingLayer : public RNNLayer { + ... + const std::vector<Param*> GetParams() const override { + std::vector<Param*> params{embed_}; + return params; + } + private: + int word_dim_, vocab_size_; + Param* embed_; + } + +The `embed_` field is a matrix whose values are parameter to be learned. +The matrix size is `vocab_size_` x `word_dim_`. + +The Setup function reads configurations for `word_dim_` and `vocab_size_`. Then +it allocates feature Blob for `max_window` words and setups `embed_`. + + int max_window = srclayers[0]->data(this).shape()[0]; + word_dim_ = proto.GetExtension(embedding_conf).word_dim(); + data_.Reshape(vector<int>{max_window, word_dim_}); + ... + embed_->Setup(vector<int>{vocab_size_, word_dim_}); + +The `ComputeFeature` function simply copies the feature vector from the `embed_` +matrix into the feature Blob. + + # reset effective window size + window_ = datalayer->window(); + auto records = datalayer->records(); + ... + for (int t = 0; t < window_; t++) { + int idx <- word index + Copy(words[t], embed[idx]); + } + +The `ComputeGradient` function copies back the gradients to the `embed_` matrix. + +The configuration for `EmbeddingLayer` is like, + + user_type: "kEmbedding" + [embedding_conf] { + word_dim: 15 + vocab_size: 3720 + } + srclayers: "data" + param { + name: "w1" + init { + type: kUniform + low:-0.3 + high:0.3 + } + } + +#### HiddenLayer + +This layer unrolls the recurrent connections for at most max_window times. +The feature for position k is computed based on the feature from the embedding layer (position k) +and the feature at position k-1 of this layer. The formula is + +`$$f[k]=\sigma (f[t-1]*W+src[t])$$` + +where `$W$` is a matrix with `word_dim_` x `word_dim_` parameters. + +If you want to implement a recurrent neural network following our +design, this layer is of vital importance for you to refer to. + + class HiddenLayer : public RNNLayer { + ... + const std::vector<Param*> GetParams() const override { + std::vector<Param*> params{weight_}; + return params; + } + private: + Param* weight_; + }; + +The `Setup` function setups the weight matrix as + + weight_->Setup(std::vector<int>{word_dim, word_dim}); + +The `ComputeFeature` function gets the effective window size (`window_`) from its source layer +i.e., the embedding layer. Then it propagates the feature from position 0 to position +`window_` -1. The detailed descriptions for this process are illustrated as follows. + + void HiddenLayer::ComputeFeature() { + for(int t = 0; t < window_size; t++){ + if(t == 0) + Copy(data[t], src[t]); + else + data[t]=sigmoid(data[t-1]*W + src[t]); + } + } + +The `ComputeGradient` function computes the gradient of the loss w.r.t. W and the source layer. +Particularly, for each position k, since data[k] contributes to data[k+1] and the feature +at position k in its destination layer (the loss layer), grad[k] should contains the gradient +from two parts. The destination layer has already computed the gradient from the loss layer into +grad[k]; In the `ComputeGradient` function, we need to add the gradient from position k+1. + + void HiddenLayer::ComputeGradient(){ + ... + for (int k = window_ - 1; k >= 0; k--) { + if (k < window_ - 1) { + grad[k] += dot(grad[k + 1], weight.T()); // add gradient from position t+1. + } + grad[k] =... // compute gL/gy[t], y[t]=data[t-1]*W+src[t] + } + gweight = dot(data.Slice(0, window_-1).T(), grad.Slice(1, window_)); + Copy(gsrc, grad); + } + +After the loop, we get the gradient of the loss w.r.t y[k], which is used to +compute the gradient of W and the src[k]. + +#### LossLayer + +This layer computes the cross-entropy loss and the `$log_{10}P(w_{t+1}|w_t)$` (which +could be averaged over all words by users to get the PPL value). + +There are two configuration fields to be specified by users. + + message LossProto { + optional int32 nclass = 1; + optional int32 vocab_size = 2; + } + +There are two weight matrices to be learned + + class LossLayer : public RNNLayer { + ... + private: + Param* word_weight_, *class_weight_; + } + +The ComputeFeature function computes the two probabilities respectively. + +`$$P(C_{w_{t+1}}|w_t) = Softmax(w_t * class\_weight_)$$` +`$$P(w_{t+1}|C_{w_{t+1}}) = Softmax(w_t * word\_weight[class\_start:class\_end])$$` + +`$w_t$` is the feature from the hidden layer for the k-th word, its ground truth +next word is `$w_{t+1}$`. The first equation computes the probability distribution over all +classes for the next word. The second equation computes the +probability distribution over the words in the ground truth class for the next word. + +The ComputeGradient function computes the gradient of the source layer +(i.e., the hidden layer) and the two weight matrices. + +### Updater Configuration + +We employ kFixedStep type of the learning rate change method and the +configuration is as follows. We decay the learning rate once the performance does +not increase on the validation dataset. + + updater{ + type: kSGD + learning_rate { + type: kFixedStep + fixedstep_conf:{ + step:0 + step:48810 + step:56945 + step:65080 + step:73215 + step_lr:0.1 + step_lr:0.05 + step_lr:0.025 + step_lr:0.0125 + step_lr:0.00625 + } + } + } + +### TrainOneBatch() Function + +We use BP (BackPropagation) algorithm to train the RNN model here. The +corresponding configuration can be seen below. + + # In job.conf file + train_one_batch { + alg: kBackPropagation + } + +### Cluster Configuration + +The default cluster configuration can be used, i.e., single worker and single server +in a single process. Added: incubator/singa/site/trunk/content/markdown/v0.3.0/kr/test.md URL: http://svn.apache.org/viewvc/incubator/singa/site/trunk/content/markdown/v0.3.0/kr/test.md?rev=1740048&view=auto ============================================================================== --- incubator/singa/site/trunk/content/markdown/v0.3.0/kr/test.md (added) +++ incubator/singa/site/trunk/content/markdown/v0.3.0/kr/test.md Wed Apr 20 05:09:06 2016 @@ -0,0 +1,119 @@ +# Performance Test and Feature Extraction + +---- + +Once SINGA finishes the training of a model, it would checkpoint the model parameters +into disk files under the [checkpoint folder](checkpoint.html). Model parameters can also be dumped +into this folder periodically during training if the +[checkpoint configuration[(checkpoint.html) fields are set. With the checkpoint +files, we can load the model parameters to conduct performance test, feature extraction and prediction +against new data. + +To load the model parameters from checkpoint files, we need to add the paths of +checkpoint files in the job configuration file + + checkpoint_path: PATH_TO_CHECKPOINT_FILE1 + checkpoint_path: PATH_TO_CHECKPOINT_FILE2 + ... + +The new dataset is configured by specifying the ``test_step`` and the data input +layer, e.g. the following configuration is for a dataset with 100*100 instances. + + test_steps: 100 + net { + layer { + name: "input" + store_conf { + backend: "kvfile" + path: PATH_TO_TEST_KVFILE + batchsize: 100 + } + } + ... + } + +## Performance Test + +This application is to test the performance, e.g., accuracy, of the previously +trained model. Depending on the application, the test data may have ground truth +labels or not. For example, if the model is trained for image classification, +the test images must have ground truth labels to calculate the accuracy; if the +model is an auto-encoder, the performance could be measured by reconstruction error, which +does not require extra labels. For both cases, there would be a layer that calculates +the performance, e.g., the `SoftmaxLossLayer`. + +The job configuration file for the cifar10 example can be used directly for testing after +adding the checkpoint path. The running command is + + + $ ./bin/singa-run.sh -conf examples/cifar10/job.conf -test + +The performance would be output on the screen like, + + + Load from checkpoint file examples/cifar10/checkpoint/step50000-worker0 + accuracy = 0.728000, loss = 0.807645 + +## Feature extraction + +Since deep learning models are good at learning features, feature extraction for +is a major functionality of deep learning models, e.g., we can extract features +from the fully connected layers of [AlexNet](www.cs.toronto.edu/~fritz/absps/imagenet.pdf) as image features for image retrieval. +To extract the features from one layer, we simply add an output layer after that layer. +For instance, to extract the fully connected (with name `ip1`) layer of the cifar10 example model, +we replace the `SoftmaxLossLayer` with a `CSVOutputLayer` which extracts the features into a CSV file, + + layer { + name: "ip1" + } + layer { + name: "output" + type: kCSVOutput + srclayers: "ip1" + store_conf { + backend: "textfile" + path: OUTPUT_FILE_PATH + } + } + +The input layer and test steps, and the running command are the same as in *Performance Test* section. + +## Label Prediction + +If the output layer is connected to a layer that predicts labels of images, +the output layer would then write the prediction results into files. +SINGA provides two built-in layers for generating prediction results, namely, + +* SoftmaxLayer, generates probabilities of each candidate labels. +* ArgSortLayer, sorts labels according to probabilities in descending order and keep topk labels. + +By connecting the two layers with the previous layer and the output layer, we can +extract the predictions of each instance. For example, + + layer { + name: "feature" + ... + } + layer { + name: "softmax" + type: kSoftmax + srclayers: "feature" + } + layer { + name: "prediction" + type: kArgSort + srclayers: "softmax" + argsort_conf { + topk: 5 + } + } + layer { + name: "output" + type: kCSVOutput + srclayers: "prediction" + store_conf {} + } + +The top-5 labels of each instance will be written as one line of the output CSV file. +Currently, above layers cannot co-exist with the loss layers used for training. +Please comment out the loss layers for extracting prediction results. Added: incubator/singa/site/trunk/content/markdown/v0.3.0/kr/train-one-batch.md URL: http://svn.apache.org/viewvc/incubator/singa/site/trunk/content/markdown/v0.3.0/kr/train-one-batch.md?rev=1740048&view=auto ============================================================================== --- incubator/singa/site/trunk/content/markdown/v0.3.0/kr/train-one-batch.md (added) +++ incubator/singa/site/trunk/content/markdown/v0.3.0/kr/train-one-batch.md Wed Apr 20 05:09:06 2016 @@ -0,0 +1,179 @@ +# Train-One-Batch + +--- + +For each SGD iteration, every worker calls the `TrainOneBatch` function to +compute gradients of parameters associated with local layers (i.e., layers +dispatched to it). SINGA has implemented two algorithms for the +`TrainOneBatch` function. Users select the corresponding algorithm for +their model in the configuration. + +## Basic user guide + +### Back-propagation + +[BP algorithm](http://yann.lecun.com/exdb/publis/pdf/lecun-98b.pdf) is used for +computing gradients of feed-forward models, e.g., [CNN](cnn.html) +and [MLP](mlp.html), and [RNN](rnn.html) models in SINGA. + + + # in job.conf + alg: kBP + +To use the BP algorithm for the `TrainOneBatch` function, users just simply +configure the `alg` field with `kBP`. If a neural net contains user-defined +layers, these layers must be implemented properly be to consistent with the +implementation of the BP algorithm in SINGA (see below). + + +### Contrastive Divergence + +[CD algorithm](http://www.cs.toronto.edu/~fritz/absps/nccd.pdf) is used for +computing gradients of energy models like RBM. + + # job.conf + alg: kCD + cd_conf { + cd_k: 2 + } + +To use the CD algorithm for the `TrainOneBatch` function, users just configure +the `alg` field to `kCD`. Uses can also configure the Gibbs sampling steps in +the CD algorthm through the `cd_k` field. By default, it is set to 1. + + + +## Advanced user guide + +### Implementation of BP + +The BP algorithm is implemented in SINGA following the below pseudo code, + + BPTrainOnebatch(step, net) { + // forward propagate + foreach layer in net.local_layers() { + if IsBridgeDstLayer(layer) + recv data from the src layer (i.e., BridgeSrcLayer) + foreach param in layer.params() + Collect(param) // recv response from servers for last update + + layer.ComputeFeature(kForward) + + if IsBridgeSrcLayer(layer) + send layer.data_ to dst layer + } + // backward propagate + foreach layer in reverse(net.local_layers) { + if IsBridgeSrcLayer(layer) + recv gradient from the dst layer (i.e., BridgeDstLayer) + recv response from servers for last update + + layer.ComputeGradient() + foreach param in layer.params() + Update(step, param) // send param.grad_ to servers + + if IsBridgeDstLayer(layer) + send layer.grad_ to src layer + } + } + + +It forwards features through all local layers (can be checked by layer +partition ID and worker ID) and backwards gradients in the reverse order. +[BridgeSrcLayer](layer.html#bridgesrclayer--bridgedstlayer) +(resp. `BridgeDstLayer`) will be blocked until the feature (resp. +gradient) from the source (resp. destination) layer comes. Parameter gradients +are sent to servers via `Update` function. Updated parameters are collected via +`Collect` function, which will be blocked until the parameter is updated. +[Param](param.html) objects have versions, which can be used to +check whether the `Param` objects have been updated or not. + +Since RNN models are unrolled into feed-forward models, users need to implement +the forward propagation in the recurrent layer's `ComputeFeature` function, +and implement the backward propagation in the recurrent layer's `ComputeGradient` +function. As a result, the whole `TrainOneBatch` runs +[back-propagation through time (BPTT)](https://en.wikipedia.org/wiki/Backpropagation_through_time) algorithm. + +### Implementation of CD + +The CD algorithm is implemented in SINGA following the below pseudo code, + + CDTrainOneBatch(step, net) { + # positive phase + foreach layer in net.local_layers() + if IsBridgeDstLayer(layer) + recv positive phase data from the src layer (i.e., BridgeSrcLayer) + foreach param in layer.params() + Collect(param) // recv response from servers for last update + layer.ComputeFeature(kPositive) + if IsBridgeSrcLayer(layer) + send positive phase data to dst layer + + # negative phase + foreach gibbs in [0...layer_proto_.cd_k] + foreach layer in net.local_layers() + if IsBridgeDstLayer(layer) + recv negative phase data from the src layer (i.e., BridgeSrcLayer) + layer.ComputeFeature(kPositive) + if IsBridgeSrcLayer(layer) + send negative phase data to dst layer + + foreach layer in net.local_layers() + layer.ComputeGradient() + foreach param in layer.params + Update(param) + } + +Parameter gradients are computed after the positive phase and negative phase. + +### Implementing a new algorithm + +SINGA implements BP and CD by creating two subclasses of +the [Worker](../api/classsinga_1_1Worker.html) class: +[BPWorker](../api/classsinga_1_1BPWorker.html)'s `TrainOneBatch` function implements the BP +algorithm; [CDWorker](../api/classsinga_1_1CDWorker.html)'s `TrainOneBatch` function implements the CD +algorithm. To implement a new algorithm for the `TrainOneBatch` function, users +need to create a new subclass of the `Worker`, e.g., + + class FooWorker : public Worker { + void TrainOneBatch(int step, shared_ptr<NeuralNet> net, Metric* perf) override; + void TestOneBatch(int step, Phase phase, shared_ptr<NeuralNet> net, Metric* perf) override; + }; + +The `FooWorker` must implement the above two functions for training one +mini-batch and testing one mini-batch. The `perf` argument is for collecting +training or testing performance, e.g., the objective loss or accuracy. It is +passed to the `ComputeFeature` function of each layer. + +Users can define some fields for users to configure + + # in user.proto + message FooWorkerProto { + optional int32 b = 1; + } + + extend JobProto { + optional FooWorkerProto foo_conf = 101; + } + + # in job.proto + JobProto { + ... + extension 101..max; + } + +It is similar as [adding configuration fields for a new layer](layer.html#implementing-a-new-layer-subclass). + +To use `FooWorker`, users need to register it in the [main.cc](programming-guide.html) +and configure the `alg` and `foo_conf` fields, + + # in main.cc + const int kFoo = 3; // worker ID, must be different to that of CDWorker and BPWorker + driver.RegisterWorker<FooWorker>(kFoo); + + # in job.conf + ... + alg: 3 + [foo_conf] { + b = 4; + } Added: incubator/singa/site/trunk/content/markdown/v0.3.0/kr/updater.md URL: http://svn.apache.org/viewvc/incubator/singa/site/trunk/content/markdown/v0.3.0/kr/updater.md?rev=1740048&view=auto ============================================================================== --- incubator/singa/site/trunk/content/markdown/v0.3.0/kr/updater.md (added) +++ incubator/singa/site/trunk/content/markdown/v0.3.0/kr/updater.md Wed Apr 20 05:09:06 2016 @@ -0,0 +1,284 @@ +# Updater + +--- + +Every server in SINGA has an [Updater](../api/classsinga_1_1Updater.html) +instance that updates parameters based on gradients. +In this page, the *Basic user guide* describes the configuration of an updater. +The *Advanced user guide* present details on how to implement a new updater and a new +learning rate changing method. + +## Basic user guide + +There are many different parameter updating protocols (i.e., subclasses of +`Updater`). They share some configuration fields like + +* `type`, an integer for identifying an updater; +* `learning_rate`, configuration for the +[LRGenerator](../api/classsinga_1_1LRGenerator.html) which controls the learning rate. +* `weight_decay`, the co-efficient for [L2 * regularization](http://deeplearning.net/tutorial/gettingstarted.html#regularization). +* [momentum](http://ufldl.stanford.edu/tutorial/supervised/OptimizationStochasticGradientDescent/). + +If you are not familiar with the above terms, you can get their meanings in +[this page provided by Karpathy](http://cs231n.github.io/neural-networks-3/#update). + +### Configuration of built-in updater classes + +#### Updater +The base `Updater` implements the [vanilla SGD algorithm](http://cs231n.github.io/neural-networks-3/#sgd). +Its configuration type is `kSGD`. +Users need to configure at least the `learning_rate` field. +`momentum` and `weight_decay` are optional fields. + + updater{ + type: kSGD + momentum: float + weight_decay: float + learning_rate { + ... + } + } + +#### AdaGradUpdater + +It inherits the base `Updater` to implement the +[AdaGrad](http://www.magicbroom.info/Papers/DuchiHaSi10.pdf) algorithm. +Its type is `kAdaGrad`. +`AdaGradUpdater` is configured similar to `Updater` except +that `momentum` is not used. + +#### NesterovUpdater + +It inherits the base `Updater` to implements the +[Nesterov](http://arxiv.org/pdf/1212.0901v2.pdf) (section 3.5) updating protocol. +Its type is `kNesterov`. +`learning_rate` and `momentum` must be configured. `weight_decay` is an +optional configuration field. + +#### RMSPropUpdater + +It inherits the base `Updater` to implements the +[RMSProp algorithm](http://cs231n.github.io/neural-networks-3/#sgd) proposed by +[Hinton](http://www.cs.toronto.edu/%7Etijmen/csc321/slides/lecture_slides_lec6.pdf)(slide 29). +Its type is `kRMSProp`. + + updater { + type: kRMSProp + rmsprop_conf { + rho: float # [0,1] + } + } + + +### Configuration of learning rate + +The `learning_rate` field is configured as, + + learning_rate { + type: ChangeMethod + base_lr: float # base/initial learning rate + ... # fields to a specific changing method + } + +The common fields include `type` and `base_lr`. SINGA provides the following +`ChangeMethod`s. + +#### kFixed + +The `base_lr` is used for all steps. + +#### kLinear + +The updater should be configured like + + learning_rate { + base_lr: float + linear_conf { + freq: int + final_lr: float + } + } + +Linear interpolation is used to change the learning rate, + + lr = (1 - step / freq) * base_lr + (step / freq) * final_lr + +#### kExponential + +The udapter should be configured like + + learning_rate { + base_lr: float + exponential_conf { + freq: int + } + } + +The learning rate for `step` is + + lr = base_lr / 2^(step / freq) + +#### kInverseT + +The updater should be configured like + + learning_rate { + base_lr: float + inverset_conf { + final_lr: float + } + } + +The learning rate for `step` is + + lr = base_lr / (1 + step / final_lr) + +#### kInverse + +The updater should be configured like + + learning_rate { + base_lr: float + inverse_conf { + gamma: float + pow: float + } + } + + +The learning rate for `step` is + + lr = base_lr * (1 + gamma * setp)^(-pow) + + +#### kStep + +The updater should be configured like + + learning_rate { + base_lr : float + step_conf { + change_freq: int + gamma: float + } + } + + +The learning rate for `step` is + + lr = base_lr * gamma^ (step / change_freq) + +#### kFixedStep + +The updater should be configured like + + learning_rate { + fixedstep_conf { + step: int + step_lr: float + + step: int + step_lr: float + + ... + } + } + +Denote the i-th tuple as (step[i], step_lr[i]), then the learning rate for +`step` is, + + step_lr[k] + +where step[k] is the smallest number that is larger than `step`. + + +## Advanced user guide + +### Implementing a new Updater subclass + +The base Updater class has one virtual function, + + class Updater{ + public: + virtual void Update(int step, Param* param, float grad_scale = 1.0f) = 0; + + protected: + UpdaterProto proto_; + LRGenerator lr_gen_; + }; + +It updates the values of the `param` based on its gradients. The `step` +argument is for deciding the learning rate which may change through time +(step). `grad_scale` scales the original gradient values. This function is +called by servers once it receives all gradients for the same `Param` object. + +To implement a new Updater subclass, users must override the `Update` function. + + class FooUpdater : public Updater { + void Update(int step, Param* param, float grad_scale = 1.0f) override; + }; + +Configuration of this new updater can be declared similar to that of a new +layer, + + # in user.proto + FooUpdaterProto { + optional int32 c = 1; + } + + extend UpdaterProto { + optional FooUpdaterProto fooupdater_conf= 101; + } + +The new updater should be registered in the +[main function](programming-guide.html) + + driver.RegisterUpdater<FooUpdater>("FooUpdater"); + +Users can then configure the job as + + # in job.conf + updater { + user_type: "FooUpdater" # must use user_type with the same string identifier as the one used for registration + fooupdater_conf { + c : 20; + } + } + +### Implementing a new LRGenerator subclass + +The base `LRGenerator` is declared as, + + virtual float Get(int step); + +To implement a subclass, e.g., `FooLRGen`, users should declare it like + + class FooLRGen : public LRGenerator { + public: + float Get(int step) override; + }; + +Configuration of `FooLRGen` can be defined using a protocol message, + + # in user.proto + message FooLRProto { + ... + } + + extend LRGenProto { + optional FooLRProto foolr_conf = 101; + } + +The configuration is then like, + + learning_rate { + user_type : "FooLR" # must use user_type with the same string identifier as the one used for registration + base_lr: float + foolr_conf { + ... + } + } + +Users have to register this subclass in the main function, + + driver.RegisterLRGenerator<FooLRGen, std::string>("FooLR") Added: incubator/singa/site/trunk/content/markdown/v0.3.0/layer.md URL: http://svn.apache.org/viewvc/incubator/singa/site/trunk/content/markdown/v0.3.0/layer.md?rev=1740048&view=auto ============================================================================== --- incubator/singa/site/trunk/content/markdown/v0.3.0/layer.md (added) +++ incubator/singa/site/trunk/content/markdown/v0.3.0/layer.md Wed Apr 20 05:09:06 2016 @@ -0,0 +1,620 @@ +# Layers + +--- + +Layer is a core abstraction in SINGA. It performs a variety of feature +transformations for extracting high-level features, e.g., loading raw features, +parsing RGB values, doing convolution transformation, etc. + +The *Basic user guide* section introduces the configuration of a built-in +layer. *Advanced user guide* explains how to extend the base Layer class to +implement users' functions. + +## Basic user guide + +### Layer configuration + +Configuration of two example layers are shown below, + + layer { + name: "data" + type: kCSVRecord + store_conf { } + } + layer{ + name: "fc1" + type: kInnerProduct + srclayers: "data" + innerproduct_conf{ } + param{ } + } + +There are some common fields for all kinds of layers: + + * `name`: a string used to differentiate two layers in a neural net. + * `type`: an integer used for identifying a specific Layer subclass. The types of built-in + layers are listed in LayerType (defined in job.proto). + For user-defined layer subclasses, `user_type` should be used instead of `type`. + * `srclayers`: names of the source layers. + In SINGA, all connections are [converted](neural-net.html) to directed connections. + * `param`: configuration for a [Param](param.html) instance. + There can be multiple Param objects in one layer. + +Different layers may have different configurations. These configurations +are defined in `<type>_conf`. E.g., "fc1" layer has +`innerproduct_conf`. The subsequent sections +explain the functionality of each built-in layer and how to configure it. + +### Built-in Layer subclasses +SINGA has provided many built-in layers, which can be used directly to create neural nets. +These layers are categorized according to their functionalities, + + * Input layers for loading records (e.g., images) from disk files, HDFS or network into memory. + * Neuron layers for feature transformation, e.g., [convolution](../api/classsinga_1_1ConvolutionLayer.html), [pooling](../api/classsinga_1_1PoolingLayer.html), dropout, etc. + * Loss layers for measuring the training objective loss, e.g., Cross Entropy loss or Euclidean loss. + * Output layers for outputting the prediction results (e.g., probabilities of each category) or features into persistent storage, e.g., disk or HDFS. + * Connection layers for connecting layers when the neural net is partitioned. + +#### Input layers + +Input layers load training/test data from disk or other places (e.g., HDFS or network) +into memory. + +##### StoreInputLayer + +[StoreInputLayer](../api/classsinga_1_1StoreInputLayer.html) is a base layer for +loading data from data store. The data store can be a KVFile or TextFile (LMDB, +LevelDB, HDFS, etc., will be supported later). Its `ComputeFeature` function reads +batchsize (string:key, string:value) tuples. Each tuple is parsed by a `Parse` function +implemented by its subclasses. + +The configuration for this layer is in `store_conf`, + + store_conf { + backend: # "kvfile" or "textfile" + path: # path to the data store + batchsize : 32 + prefetching: true #default value is false + ... + } + +##### SingleLabelRecordLayer + +It is a subclass of StoreInputLayer. It assumes the (key, value) tuple loaded +from a data store contains a feature vector (and a label) for one data instance. +All feature vectors are of the same fixed length. The shape of one instance +is configured through the `shape` field, e.g., the following configuration +specifies the shape for the CIFAR10 images. + + store_conf { + shape: 3 #channels + shape: 32 #height + shape: 32 #width + } + +It may do some preprocessing like [standardization](http://ufldl.stanford.edu/wiki/index.php/Data_Preprocessing). +The data for preprocessing is loaded by and parsed in a virtual function, which is implemented by +its subclasses. + +##### RecordInputLayer + +It is a subclass of SingleLabelRecordLayer. It parses the value field from one +tuple into a RecordProto, which is generated by Google Protobuf according +to common.proto. It can be used to store features for images (e.g., using the pixel field) +or other objects (using the data field). The key field is not parsed. + + type: kRecordInput + store_conf { + has_label: # default is true + ... + } + +##### CSVInputLayer + +It is a subclass of SingleLabelRecordLayer. The value field from one tuple is parsed +as a CSV line (separated by comma). The first number would be parsed as a label if +`has_label` is configured in `store_conf`. Otherwise, all numbers would be parsed +into one row of the `data_` Blob. + + type: kCSVInput + store_conf { + has_label: # default is true + ... + } + +##### ImagePreprocessLayer + +This layer does image preprocessing, e.g., cropping, mirroring and scaling, against +the data Blob from its source layer. It deprecates the RGBImageLayer which +works on the Record from ShardDataLayer. It still uses the same configuration as +RGBImageLayer, + + type: kImagePreprocess + rgbimage_conf { + scale: float + cropsize: int # cropping each image to keep the central part with this size + mirror: bool # mirror the image by set image[i,j]=image[i,len-j] + meanfile: "Image_Mean_File_Path" + } + +##### ShardDataLayer (Deprected) +Deprected! Please use ProtoRecordInputLayer or CSVRecordInputLayer. + +[ShardDataLayer](../api/classsinga_1_1ShardDataLayer.html) is a subclass of DataLayer, +which reads Records from disk file. The file should be created using +[DataShard](../api/classsinga_1_1DataShard.html) +class. With the data file prepared, users configure the layer as + + type: kShardData + sharddata_conf { + path: "path to data shard folder" + batchsize: int + random_skip: int + } + +`batchsize` specifies the number of records to be trained for one mini-batch. +The first `rand() % random_skip` `Record`s will be skipped at the first +iteration. This is to enforce that different workers work on different Records. + +##### LMDBDataLayer (Deprected) +Deprected! Please use ProtoRecordInputLayer or CSVRecordInputLayer. + +[LMDBDataLayer] is similar to ShardDataLayer, except that the Records are +loaded from LMDB. + + type: kLMDBData + lmdbdata_conf { + path: "path to LMDB folder" + batchsize: int + random_skip: int + } + +##### ParserLayer (Deprected) +Deprected! Please use ProtoRecordInputLayer or CSVRecordInputLayer. + +It get a vector of Records from DataLayer and parse features into +a Blob. + + virtual void ParseRecords(Phase phase, const vector<Record>& records, Blob<float>* blob) = 0; + + +##### LabelLayer (Deprected) +Deprected! Please use ProtoRecordInputLayer or CSVRecordInputLayer. + +[LabelLayer](../api/classsinga_1_1LabelLayer.html) is a subclass of ParserLayer. +It parses a single label from each Record. Consequently, it +will put $b$ (mini-batch size) values into the Blob. It has no specific configuration fields. + + +##### MnistImageLayer (Deprected) +Deprected! Please use ProtoRecordInputLayer or CSVRecordInputLayer. +[MnistImageLayer] is a subclass of ParserLayer. It parses the pixel values of +each image from the MNIST dataset. The pixel +values may be normalized as `x/norm_a - norm_b`. For example, if `norm_a` is +set to 255 and `norm_b` is set to 0, then every pixel will be normalized into +[0, 1]. + + type: kMnistImage + mnistimage_conf { + norm_a: float + norm_b: float + } + +##### RGBImageLayer (Deprected) +Deprected! Please use the ImagePreprocessLayer. +[RGBImageLayer](../api/classsinga_1_1RGBImageLayer.html) is a subclass of ParserLayer. +It parses the RGB values of one image from each Record. It may also +apply some transformations, e.g., cropping, mirroring operations. If the +`meanfile` is specified, it should point to a path that contains one Record for +the mean of each pixel over all training images. + + type: kRGBImage + rgbimage_conf { + scale: float + cropsize: int # cropping each image to keep the central part with this size + mirror: bool # mirror the image by set image[i,j]=image[i,len-j] + meanfile: "Image_Mean_File_Path" + } + +##### PrefetchLayer + +[PrefetchLayer](../api/classsinga_1_1PrefetchLayer.html) embeds other input layers +to do data prefeching. It will launch a thread to call the embedded layers to load and extract features. +It ensures that the I/O task and computation task can work simultaneously. +One example PrefetchLayer configuration is, + + layer { + name: "prefetch" + type: kPrefetch + sublayers { + name: "data" + type: kShardData + sharddata_conf { } + } + sublayers { + name: "rgb" + type: kRGBImage + srclayers:"data" + rgbimage_conf { } + } + sublayers { + name: "label" + type: kLabel + srclayers: "data" + } + exclude:kTest + } + +The layers on top of the PrefetchLayer should use the name of the embedded +layers as their source layers. For example, the "rgb" and "label" should be +configured to the `srclayers` of other layers. + + +#### Output Layers + +Output layers get data from their source layers and write them to persistent storage, +e.g., disk files or HDFS (to be supported). + +##### RecordOutputLayer + +This layer gets data (and label if it is available) from its source layer and converts it into records of type +RecordProto. Records are written as (key = instance No., value = serialized record) tuples into Store, e.g., KVFile. The configuration of this layer +should include the specifics of the Store backend via `store_conf`. + + layer { + name: "output" + type: kRecordOutput + srclayers: + store_conf { + backend: "kvfile" + path: + } + } + +##### CSVOutputLayer +This layer gets data (and label if it available) from its source layer and converts it into +a string per instance with fields separated by commas (i.e., CSV format). The shape information +is not kept in the string. All strings are written into +Store, e.g., text file. The configuration of this layer should include the specifics of the Store backend via `store_conf`. + + layer { + name: "output" + type: kCSVOutput + srclayers: + store_conf { + backend: "textfile" + path: + } + } + +#### Neuron Layers + +Neuron layers conduct feature transformations. + +#### ActivationLayer + + type: kActivation + activation_conf { + type: {RELU, SIGMOID, TANH, STANH} + } + +##### ConvolutionLayer + +[ConvolutionLayer](../api/classsinga_1_1ConvolutionLayer.html) conducts convolution transformation. + + type: kConvolution + convolution_conf { + num_filters: int + kernel: int + stride: int + pad: int + } + param { } # weight/filter matrix + param { } # bias vector + +The int value `num_filters` stands for the count of the applied filters; the int +value `kernel` stands for the convolution kernel size (equal width and height); +the int value `stride` stands for the distance between the successive filters; +the int value `pad` pads each with a given int number of pixels border of +zeros. + +##### InnerProductLayer + +[InnerProductLayer](../api/classsinga_1_1InnerProductLayer.html) is fully connected with its (single) source layer. +Typically, it has two parameter fields, one for weight matrix, and the other +for bias vector. It rotates the feature of the source layer (by multiplying with weight matrix) and +shifts it (by adding the bias vector). + + type: kInnerProduct + innerproduct_conf { + num_output: int + } + param { } # weight matrix + param { } # bias vector + + +##### PoolingLayer + +[PoolingLayer](../api/classsinga_1_1PoolingLayer.html) is used to do a normalization (or averaging or sampling) of the +feature vectors from the source layer. + + type: kPooling + pooling_conf { + pool: AVE|MAX // Choose whether use the Average Pooling or Max Pooling + kernel: int // size of the kernel filter + pad: int // the padding size + stride: int // the step length of the filter + } + +The pooling layer has two methods: Average Pooling and Max Pooling. +Use the enum AVE and MAX to choose the method. + + * Max Pooling selects the max value for each filtering area as a point of the + result feature blob. + * Average Pooling averages all values for each filtering area at a point of the + result feature blob. + +##### ReLULayer + +[ReLuLayer](../api/classsinga_1_1ReLULayer.html) has rectified linear neurons, which conducts the following +transformation, `f(x) = Max(0, x)`. It has no specific configuration fields. + +##### STanhLayer + +[STanhLayer](../api/classsinga_1_1TanhLayer.html) uses the scaled tanh as activation function, i.e., `f(x)=1.7159047* tanh(0.6666667 * x)`. +It has no specific configuration fields. + +##### SigmoidLayer + +[SigmoidLayer] uses the sigmoid (or logistic) as activation function, i.e., +`f(x)=sigmoid(x)`. It has no specific configuration fields. + + +##### Dropout Layer +[DropoutLayer](../api/asssinga_1_1DropoutLayer.html) is a layer that randomly dropouts some inputs. +This scheme helps deep learning model away from over-fitting. + + type: kDropout + dropout_conf { + dropout_ratio: float # dropout probability + } + +##### LRNLayer +[LRNLayer](../api/classsinga_1_1LRNLayer.html), (Local Response Normalization), normalizes over the channels. + + type: kLRN + lrn_conf { + local_size: int + alpha: float // scaling parameter + beta: float // exponential number + } + +`local_size` specifies the quantity of the adjoining channels which will be summed up. + For `WITHIN_CHANNEL`, it means the side length of the space region which will be summed up. + + + +### CuDNN layers + +CuDNN v3 and v4 are supported in SINGA, which include the following layers, + +* CudnnActivationLayer (activation functions are SIGMOID, TANH, RELU) +* CudnnConvLayer +* CudnnLRNLayer +* CudnnPoolLayer +* CudnnSoftmaxLayer + +These layers have the same configuration as the corresponding CPU layers. +For CuDNN v4, the batch normalization layer is added, which is named as +`CudnnBMLayer`. + + +#### Loss Layers + +Loss layers measures the objective training loss. + +##### SoftmaxLossLayer + +[SoftmaxLossLayer](../api/classsinga_1_1SoftmaxLossLayer.html) is a combination of the Softmax transformation and +Cross-Entropy loss. It applies Softmax firstly to get a prediction probability +for each output unit (neuron) and compute the cross-entropy against the ground truth. +It is generally used as the final layer to generate labels for classification tasks. + + type: kSoftmaxLoss + softmaxloss_conf { + topk: int + } + +The configuration field `topk` is for selecting the labels with `topk` +probabilities as the prediction results. It is tedious for users to view the +prediction probability of every label. + +#### ConnectionLayer + +Subclasses of ConnectionLayer are utility layers that connects other layers due +to neural net partitioning or other cases. + +##### ConcateLayer + +[ConcateLayer](../api/classsinga_1_1ConcateLayer.html) connects more than one source layers to concatenate their feature +blob along given dimension. + + type: kConcate + concate_conf { + concate_dim: int // define the dimension + } + +##### SliceLayer + +[SliceLayer](../api/classsinga_1_1SliceLayer.html) connects to more than one destination layers to slice its feature +blob along given dimension. + + type: kSlice + slice_conf { + slice_dim: int + } + +##### SplitLayer + +[SplitLayer](../api/classsinga_1_1SplitLayer.html) connects to more than one destination layers to replicate its +feature blob. + + type: kSplit + split_conf { + num_splits: int + } + +##### BridgeSrcLayer & BridgeDstLayer + +[BridgeSrcLayer](../api/classsinga_1_1BridgeSrcLayer.html) & +[BridgeDstLayer](../api/classsinga_1_1BridgeDstLayer.html) are utility layers assisting data (e.g., feature or +gradient) transferring due to neural net partitioning. These two layers are +added implicitly. Users typically do not need to configure them in their neural +net configuration. + +### OutputLayer + +It write the prediction results or the extracted features into file, HTTP stream +or other places. Currently SINGA has not implemented any specific output layer. + +## Advanced user guide + +The base Layer class is introduced in this section, followed by how to +implement a new Layer subclass. + +### Base Layer class + +#### Members + + LayerProto layer_conf_; + vector<Blob<float>> datavec_, gradvec_; + vector<AuxType> aux_data_; + +The base layer class keeps the user configuration in `layer_conf_`. +`datavec_` stores the features associated with this layer. +There are layers without feature vectors; instead, they share the data from +source layers. +The `gradvec_` is for storing the gradients of the +objective loss w.r.t. the `datavec_`. The `aux_data_` stores the auxiliary data, e.g., image label (set `AuxType` to int). +If images have variant number of labels, the AuxType can be defined to `vector<int>`. +Currently, we hard code `AuxType` to int. It will be added as a template argument of Layer class later. + +If a layer has parameters, these parameters are declared using type +[Param](param.html). Since some layers do not have +parameters, we do not declare any `Param` in the base layer class. + +#### Functions + + virtual void Setup(const LayerProto& conf, const vector<Layer*>& srclayers); + virtual void ComputeFeature(int flag, const vector<Layer*>& srclayers) = 0; + virtual void ComputeGradient(int flag, const vector<Layer*>& srclayers) = 0; + +The `Setup` function reads user configuration, i.e. `conf`, and information +from source layers, e.g., mini-batch size, to set the +shape of the `data_` (and `grad_`) field as well +as some other layer specific fields. +Memory will not be allocated until computation over the data structure happens. + +The `ComputeFeature` function evaluates the feature blob by transforming (e.g. +convolution and pooling) features from the source layers. `ComputeGradient` +computes the gradients of parameters associated with this layer. These two +functions are invoked by the [TrainOneBatch](train-one-batch.html) +function during training. Hence, they should be consistent with the +`TrainOneBatch` function. Particularly, for feed-forward and RNN models, they are +trained using [BP algorithm](train-one-batch.html#back-propagation), +which requires each layer's `ComputeFeature` +function to compute `data_` based on source layers, and requires each layer's +`ComputeGradient` to compute gradients of parameters and source layers' +`grad_`. For energy models, e.g., RBM, they are trained by +[CD algorithm](train-one-batch.html#contrastive-divergence), which +requires each layer's `ComputeFeature` function to compute the feature vectors +for the positive phase or negative phase depending on the `phase` argument, and +requires the `ComputeGradient` function to only compute parameter gradients. +For some layers, e.g., loss layer or output layer, they can put the loss or +prediction result into the `metric` argument, which will be averaged and +displayed periodically. + +### Implementing a new Layer subclass + +Users can extend the Layer class or other subclasses to implement their own feature transformation +logics as long as the two virtual functions are overridden to be consistent with +the `TrainOneBatch` function. The `Setup` function may also be overridden to +read specific layer configuration. + +The [RNNLM](rnn.html) provides a couple of user-defined layers. You can refer to them as examples. + +#### Layer specific protocol message + +To implement a new layer, the first step is to define the layer specific +configuration. Suppose the new layer is `FooLayer`, the layer specific +google protocol message `FooLayerProto` should be defined as + + # in user.proto + package singa + import "job.proto" + message FooLayerProto { + optional int32 a = 1; // specific fields to the FooLayer + } + +In addition, users need to extend the original `LayerProto` (defined in job.proto of SINGA) +to include the `foo_conf` as follows. + + extend LayerProto { + optional FooLayerProto foo_conf = 101; // unique field id, reserved for extensions + } + +If there are multiple new layers, then each layer that has specific +configurations would have a `<type>_conf` field and takes one unique extension number. +SINGA has reserved enough extension numbers, e.g., starting from 101 to 1000. + + # job.proto of SINGA + LayerProto { + ... + extensions 101 to 1000; + } + +With user.proto defined, users can use +[protoc](https://developers.google.com/protocol-buffers/) to generate the `user.pb.cc` +and `user.pb.h` files. In users' code, the extension fields can be accessed via, + + auto conf = layer_proto_.GetExtension(foo_conf); + int a = conf.a(); + +When defining configurations of the new layer (in job.conf), users should use +`user_type` for its layer type instead of `type`. In addition, `foo_conf` +should be enclosed in brackets. + + layer { + name: "foo" + user_type: "kFooLayer" # Note user_type of user-defined layers is string + [foo_conf] { # Note there is a pair of [] for extension fields + a: 10 + } + } + +#### New Layer subclass declaration + +The new layer subclass can be implemented like the built-in layer subclasses. + + class FooLayer : public singa::Layer { + public: + void Setup(const LayerProto& conf, const vector<Layer*>& srclayers) override; + void ComputeFeature(int flag, const vector<Layer*>& srclayers) override; + void ComputeGradient(int flag, const vector<Layer*>& srclayers) override; + + private: + // members + }; + +Users must override the two virtual functions to be called by the +`TrainOneBatch` for either BP or CD algorithm. Typically, the `Setup` function +will also be overridden to initialize some members. The user configured fields +can be accessed through `layer_conf_` as shown in the above paragraphs. + +#### New Layer subclass registration + +The newly defined layer should be registered in [main.cc](http://singa.incubator.apache.org/docs/programming-guide) by adding + + driver.RegisterLayer<FooLayer, std::string>("kFooLayer"); // "kFooLayer" should be matched to layer configurations in job.conf. + +After that, the [NeuralNet](neural-net.html) can create instances of the new Layer subclass. Added: incubator/singa/site/trunk/content/markdown/v0.3.0/mesos.md URL: http://svn.apache.org/viewvc/incubator/singa/site/trunk/content/markdown/v0.3.0/mesos.md?rev=1740048&view=auto ============================================================================== --- incubator/singa/site/trunk/content/markdown/v0.3.0/mesos.md (added) +++ incubator/singa/site/trunk/content/markdown/v0.3.0/mesos.md Wed Apr 20 05:09:06 2016 @@ -0,0 +1,87 @@ +#Distributed Training on Mesos + +This guide explains how to start SINGA distributed training on a Mesos cluster. It assumes that both Mesos and HDFS are already running, and every node has SINGA installed. +We assume the architecture depicted below, in which a cluster nodes are Docker container. Refer to [Docker guide](docker.html) for details of how to start individual nodes and set up network connection between them (make sure [weave](http://weave.works/guides/weave-docker-ubuntu-simple.html) is running at each node, and the cluster's headnode is running in container `node0`) + + + +--- + +## Start HDFS and Mesos +Go inside each container, using: +```` +docker exec -it nodeX /bin/bash +```` +and configure it as follows: + +* On container `node0` + + hadoop namenode -format + hadoop-daemon.sh start namenode + /opt/mesos-0.22.0/build/bin/mesos-master.sh --work_dir=/opt --log_dir=/opt --quiet > /dev/null & + zk-service.sh start + +* On container `node1, node2, ...` + + hadoop-daemon.sh start datanode + /opt/mesos-0.22.0/build/bin/mesos-slave.sh --master=node0:5050 --log_dir=/opt --quiet > /dev/null & + +To check if the setup has been successful, check that HDFS namenode has registered `N` datanodes, via: + +```` +hadoop dfsadmin -report +```` + +####Important If the Docker version is 1.9 or newer, make sure [name resolution is set up +properly](docker.html#launch_pseudo) + +#### Mesos logs +Mesos logs are stored at `/opt/lt-mesos-master.INFO` on `node0` and `/opt/lt-mesos-slave.INFO` at other nodes. + +--- + +## Starting SINGA training on Mesos +Assumed that Mesos and HDFS are already started, SINGA job can be launched at **any** container. + +#### Launching job + +1. Log in to any container, then + cd incubator-singa/tool/mesos +<a name="job_start"></a> +2. Check that configuration files are correct: + + `scheduler.conf` contains information about the master nodes + + `singa.conf` contains information about Zookeeper node0 + + Job configuration file `job.conf` **contains full path to the examples directories (NO RELATIVE PATH!).** +3. Start the job: + + If starting for the first time: + + ./scheduler <job config file> -scheduler_conf <scheduler config file> -singa_conf <SINGA config file> + + If not the first time: + + ./scheduler <job config file> + +**Notes.** Each running job is given a `frameworkID`. Look for the log message of the form: + + Framework registered with XXX-XXX-XXX-XXX-XXX-XXX + +#### Monitoring and Debugging + +Each Mesos job is given a `frameworkID` and a *sandbox* directory is created for each job. +The directory is in the specified `work_dir` (or `/tmp/mesos`) by default. For example, the error +during SINGA execution can be found at: + + /tmp/mesos/slaves/xxxxx-Sx/frameworks/xxxxx/executors/SINGA_x/runs/latest/stderr + +Other artifacts, like files downloaded from HDFS (`job.conf`) and `stdout` can be found in the same +directory. + +#### Stopping + +There are two way to kill the running job: + +1. If the scheduler is running in the foreground, simply kill it (using `Ctrl-C`, for example). + +2. If the scheduler is running in the background, kill it using Mesos's REST API: + + curl -d "frameworkId=XXX-XXX-XXX-XXX-XXX-XXX" -X POST http://<master>/master/shutdown +
