Added: incubator/singa/site/trunk/content/markdown/v0.2.0/mlp.md URL: http://svn.apache.org/viewvc/incubator/singa/site/trunk/content/markdown/v0.2.0/mlp.md?rev=1738695&view=auto ============================================================================== --- incubator/singa/site/trunk/content/markdown/v0.2.0/mlp.md (added) +++ incubator/singa/site/trunk/content/markdown/v0.2.0/mlp.md Tue Apr 12 06:22:20 2016 @@ -0,0 +1,215 @@ +# MLP Example + +--- + +Multilayer perceptron (MLP) is a subclass of feed-forward neural networks. +A MLP typically consists of multiple directly connected layers, with each layer fully +connected to the next one. In this example, we will use SINGA to train a +[simple MLP model proposed by Ciresan](http://arxiv.org/abs/1003.0358) +for classifying handwritten digits from the [MNIST dataset](http://yann.lecun.com/exdb/mnist/). + +## Running instructions + +Please refer to the [installation](installation.html) page for +instructions on building SINGA, and the [quick start](quick-start.html) +for instructions on starting zookeeper. + +We have provided scripts for preparing the training and test dataset in *examples/cifar10/*. + + # in examples/mnist + $ cp Makefile.example Makefile + $ make download + $ make create + +### Training on CPU + +After the datasets are prepared, we start the training by + + ./bin/singa-run.sh -conf examples/mnist/job.conf + +After it is started, you should see output like + + Record job information to /tmp/singa-log/job-info/job-1-20150817-055231 + Executing : ./singa -conf /xxx/incubator-singa/examples/mnist/job.conf -singa_conf /xxx/incubator-singa/conf/singa.conf -singa_job 1 + E0817 07:15:09.211885 34073 cluster.cc:51] proc #0 -> 192.168.5.128:49152 (pid = 34073) + E0817 07:15:14.972231 34114 server.cc:36] Server (group = 0, id = 0) start + E0817 07:15:14.972520 34115 worker.cc:134] Worker (group = 0, id = 0) start + E0817 07:15:24.462602 34073 trainer.cc:373] Test step-0, loss : 2.341021, accuracy : 0.109100 + E0817 07:15:47.341076 34073 trainer.cc:373] Train step-0, loss : 2.357269, accuracy : 0.099000 + E0817 07:16:07.173364 34073 trainer.cc:373] Train step-10, loss : 2.222740, accuracy : 0.201800 + E0817 07:16:26.714855 34073 trainer.cc:373] Train step-20, loss : 2.091030, accuracy : 0.327200 + E0817 07:16:46.590946 34073 trainer.cc:373] Train step-30, loss : 1.969412, accuracy : 0.442100 + E0817 07:17:06.207080 34073 trainer.cc:373] Train step-40, loss : 1.865466, accuracy : 0.514800 + E0817 07:17:25.890033 34073 trainer.cc:373] Train step-50, loss : 1.773849, accuracy : 0.569100 + E0817 07:17:51.208935 34073 trainer.cc:373] Test step-60, loss : 1.613709, accuracy : 0.662100 + E0817 07:17:53.176766 34073 trainer.cc:373] Train step-60, loss : 1.659150, accuracy : 0.652600 + E0817 07:18:12.783370 34073 trainer.cc:373] Train step-70, loss : 1.574024, accuracy : 0.666000 + E0817 07:18:32.904942 34073 trainer.cc:373] Train step-80, loss : 1.529380, accuracy : 0.670500 + E0817 07:18:52.608111 34073 trainer.cc:373] Train step-90, loss : 1.443911, accuracy : 0.703500 + E0817 07:19:12.168465 34073 trainer.cc:373] Train step-100, loss : 1.387759, accuracy : 0.721000 + E0817 07:19:31.855865 34073 trainer.cc:373] Train step-110, loss : 1.335246, accuracy : 0.736500 + E0817 07:19:57.327133 34073 trainer.cc:373] Test step-120, loss : 1.216652, accuracy : 0.769900 + +After the training of some steps (depends on the setting) or the job is +finished, SINGA will [checkpoint](checkpoint.html) the model parameters. + +### Training on GPU + +To train this example model on GPU, just add a field in the configuration file for +the GPU device, + + # job.conf + gpu: 0 + +### Training using Python script + +The python helpers come with SINGA 0.2 make it easy to configure the job. For example +the job.conf is replaced with a simple python script mnist_mlp.py +which has about 30 lines of code following the [Keras API](http://keras.io/). + + ./bin/singa-run.sh -exec tool/python/examples/mnist_mlp.py + + + +## Details + +To train a model in SINGA, you need to prepare the datasets, +and a job configuration which specifies the neural net structure, training +algorithm (BP or CD), SGD update algorithm (e.g. Adagrad), +number of training/test steps, etc. + +### Data preparation + +Before using SINGA, you need to write a program to pre-process the dataset you +use to a format that SINGA can read. Please refer to the +[Data Preparation](data.html) to get details about preparing +this MNIST dataset. + + +### Neural net + +<div style = "text-align: center"> +<img src = "../images/example-mlp.png" style = "width: 230px"> +<br/><strong>Figure 1 - Net structure of the MLP example. </strong></img> +</div> + + +Figure 1 shows the structure of the simple MLP model, which is constructed following +[Ciresan's paper](http://arxiv.org/abs/1003.0358). The dashed circle contains +two layers which represent one feature transformation stage. There are 6 such +stages in total. They sizes of the [InnerProductLayer](layer.html#innerproductlayer)s in these circles decrease from +2500->2000->1500->1000->500->10. + +Next we follow the guide in [neural net page](neural-net.html) +and [layer page](layer.html) to write the neural net configuration. + +* We configure an input layer to read the training/testing records from a disk file. + + layer { + name: "data" + type: kRecordInput + store_conf { + backend: "kvfile" + path: "examples/mnist/train_data.bin" + random_skip: 5000 + batchsize: 64 + shape: 784 + std_value: 127.5 + mean_value: 127.5 + } + exclude: kTest + } + + layer { + name: "data" + type: kRecordInput + store_conf { + backend: "kvfile" + path: "examples/mnist/test_data.bin" + batchsize: 100 + shape: 784 + std_value: 127.5 + mean_value: 127.5 + } + exclude: kTrain + } + + +* All [InnerProductLayer](layer.html#innerproductlayer)s are configured similarly as, + + layer{ + name: "fc1" + type: kInnerProduct + srclayers:"data" + innerproduct_conf{ + num_output: 2500 + } + param{ + name: "w1" + ... + } + param{ + name: "b1" + .. + } + } + + with the `num_output` decreasing from 2500 to 10. + +* A [STanhLayer](layer.html#stanhlayer) is connected to every InnerProductLayer +except the last one. It transforms the feature via scaled tanh function. + + layer{ + name: "tanh1" + type: kSTanh + srclayers:"fc1" + } + +* The final [Softmax loss layer](layer.html#softmaxloss) connects +to LabelLayer and the last STanhLayer. + + layer{ + name: "loss" + type:kSoftmaxLoss + softmaxloss_conf{ topk:1 } + srclayers:"fc6" + srclayers:"data" + } + +### Updater + +The [normal SGD updater](updater.html#updater) is selected. +The learning rate shrinks by 0.997 every 60 steps (i.e., one epoch). + + updater{ + type: kSGD + learning_rate{ + base_lr: 0.001 + type : kStep + step_conf{ + change_freq: 60 + gamma: 0.997 + } + } + } + +### TrainOneBatch algorithm + +The MLP model is a feed-forward model, hence +[Back-propagation algorithm](train-one-batch#back-propagation) +is selected. + + train_one_batch { + alg: kBP + } + +### Cluster setting + +The following configuration set a single worker and server for training. +[Training frameworks](frameworks.html) page introduces configurations of a couple of distributed +training frameworks. + + cluster { + nworker_groups: 1 + nserver_groups: 1 + }
Added: incubator/singa/site/trunk/content/markdown/v0.2.0/model-config.md URL: http://svn.apache.org/viewvc/incubator/singa/site/trunk/content/markdown/v0.2.0/model-config.md?rev=1738695&view=auto ============================================================================== --- incubator/singa/site/trunk/content/markdown/v0.2.0/model-config.md (added) +++ incubator/singa/site/trunk/content/markdown/v0.2.0/model-config.md Tue Apr 12 06:22:20 2016 @@ -0,0 +1,294 @@ +# Model Configuration + +--- + +SINGA uses the stochastic gradient descent (SGD) algorithm to train parameters +of deep learning models. For each SGD iteration, there is a +[Worker](architecture.html) computing +gradients of parameters from the NeuralNet and a [Updater]() updating parameter +values based on gradients. Hence the model configuration mainly consists these +three parts. We will introduce the NeuralNet, Worker and Updater in the +following paragraphs and describe the configurations for them. All model +configuration is specified in the model.conf file in the user provided +workspace folder. E.g., the [cifar10 example folder](https://github.com/apache/incubator-singa/tree/master/examples/cifar10) +has a model.conf file. + + +## NeuralNet + +### Uniform model (neuralnet) representation + +<img src = "../images/model-categorization.png" style = "width: 400px"> Fig. 1: +Deep learning model categorization</img> + +Many deep learning models have being proposed. Fig. 1 is a categorization of +popular deep learning models based on the layer connections. The +[NeuralNet](https://github.com/apache/incubator-singa/blob/master/include/neuralnet/neuralnet.h) +abstraction of SINGA consists of multiple directly connected layers. This +abstraction is able to represent models from all the three categorizations. + + * For the feed-forward models, their connections are already directed. + + * For the RNN models, we unroll them into directed connections, as shown in + Fig. 2. + + * For the undirected connections in RBM, DBM, etc., we replace each undirected + connection with two directed connection, as shown in Fig. 3. + +<div style = "height: 200px"> +<div style = "float:left; text-align: center"> +<img src = "../images/unroll-rbm.png" style = "width: 280px"> <br/>Fig. 2: Unroll RBM </img> +</div> +<div style = "float:left; text-align: center; margin-left: 40px"> +<img src = "../images/unroll-rnn.png" style = "width: 550px"> <br/>Fig. 3: Unroll RNN </img> +</div> +</div> + +In specific, the NeuralNet class is defined in +[neuralnet.h](https://github.com/apache/incubator-singa/blob/master/include/neuralnet/neuralnet.h) : + + ... + vector<Layer*> layers_; + ... + +The Layer class is defined in +[base_layer.h](https://github.com/apache/incubator-singa/blob/master/include/neuralnet/base_layer.h): + + vector<Layer*> srclayers_, dstlayers_; + LayerProto layer_proto_; // layer configuration, including meta info, e.g., name + ... + + +The connection with other layers are kept in the `srclayers_` and `dstlayers_`. +Since there are many different feature transformations, there are many +different Layer implementations correspondingly. For layers that have +parameters in their feature transformation functions, they would have Param +instances in the layer class, e.g., + + Param weight; + + +### Configure the structure of a NeuralNet instance + +To train a deep learning model, the first step is to write the configurations +for the model structure, i.e., the layers and connections for the NeuralNet. +Like [Caffe](http://caffe.berkeleyvision.org/), we use the [Google Protocol +Buffer](https://developers.google.com/protocol-buffers/) to define the +configuration protocol. The +[NetProto](https://github.com/apache/incubator-singa/blob/master/src/proto/model.proto) +specifies the configuration fields for a NeuralNet instance, + +message NetProto { + repeated LayerProto layer = 1; + ... +} + +The configuration is then + + layer { + // layer configuration + } + layer { + // layer configuration + } + ... + +To configure the model structure, we just configure each layer involved in the model. + + message LayerProto { + // the layer name used for identification + required string name = 1; + // source layer names + repeated string srclayers = 3; + // parameters, e.g., weight matrix or bias vector + repeated ParamProto param = 12; + // the layer type from the enum above + required LayerType type = 20; + // configuration for convolution layer + optional ConvolutionProto convolution_conf = 30; + // configuration for concatenation layer + optional ConcateProto concate_conf = 31; + // configuration for dropout layer + optional DropoutProto dropout_conf = 33; + ... + } + +A sample configuration for a feed-forward model is like + + layer { + name : "input" + type : kRecordInput + } + layer { + name : "conv" + type : kInnerProduct + srclayers : "input" + param { + // configuration for parameter + } + innerproduct_conf { + // configuration for this specific layer + } + ... + } + +The layer type list is defined in +[LayerType](https://github.com/apache/incubator-singa/blob/master/src/proto/model.proto). +One type (kFoo) corresponds to one child class of Layer (FooLayer) and one +configuration field (foo_conf). All built-in layers are introduced in the [layer page](layer.html). + +## Worker + +At the beginning, the Work will initialize the values of Param instances of +each layer either randomly (according to user configured distribution) or +loading from a [checkpoint file](). For each training iteration, the worker +visits layers of the neural network to compute gradients of Param instances of +each layer. Corresponding to the three categories of models, there are three +different algorithm to compute the gradients of a neural network. + + 1. Back-propagation (BP) for feed-forward models + 2. Back-propagation through time (BPTT) for recurrent neural networks + 3. Contrastive divergence (CD) for RBM, DBM, etc models. + +SINGA has provided these three algorithms as three Worker implementations. +Users only need to configure in the model.conf file to specify which algorithm +should be used. The configuration protocol is + + message ModelProto { + ... + enum GradCalcAlg { + // BP algorithm for feed-forward models, e.g., CNN, MLP, RNN + kBP = 1; + // BPTT for recurrent neural networks + kBPTT = 2; + // CD algorithm for RBM, DBM etc., models + kCd = 3; + } + // gradient calculation algorithm + required GradCalcAlg alg = 8 [default = kBackPropagation]; + ... + } + +These algorithms override the TrainOneBatch function of the Worker. E.g., the +BPWorker implements it as + + void BPWorker::TrainOneBatch(int step, Metric* perf) { + Forward(step, kTrain, train_net_, perf); + Backward(step, train_net_); + } + +The Forward function passes the raw input features of one mini-batch through +all layers, and the Backward function visits the layers in reverse order to +compute the gradients of the loss w.r.t each layer's feature and each layer's +Param objects. Different algorithms would visit the layers in different orders. +Some may traverses the neural network multiple times, e.g., the CDWorker's +TrainOneBatch function is: + + void CDWorker::TrainOneBatch(int step, Metric* perf) { + PostivePhase(step, kTrain, train_net_, perf); + NegativePhase(step, kTran, train_net_, perf); + GradientPhase(step, train_net_); + } + +Each `*Phase` function would visit all layers one or multiple times. +All algorithms will finally call two functions of the Layer class: + + /** + * Transform features from connected layers into features of this layer. + * + * @param phase kTrain, kTest, kPositive, etc. + */ + virtual void ComputeFeature(Phase phase, Metric* perf) = 0; + /** + * Compute gradients for parameters (and connected layers). + * + * @param phase kTrain, kTest, kPositive, etc. + */ + virtual void ComputeGradient(Phase phase) = 0; + +All [Layer implementations]() must implement the above two functions. + + +## Updater + +Once the gradients of parameters are computed, the Updater will update +parameter values. There are many SGD variants for updating parameters, like +[AdaDelta](http://arxiv.org/pdf/1212.5701v1.pdf), +[AdaGrad](http://www.magicbroom.info/Papers/DuchiHaSi10.pdf), +[RMSProp](http://www.cs.toronto.edu/~tijmen/csc321/slides/lecture_slides_lec6.pdf), +[Nesterov](http://scholar.google.com/citations?view_op=view_citation&hl=en&user=DJ8Ep8YAAAAJ&citation_for_view=DJ8Ep8YAAAAJ:hkOj_22Ku90C) +and SGD with momentum. The core functions of the Updater is + + /** + * Update parameter values based on gradients + * @param step training step + * @param param pointer to the Param object + * @param grad_scale scaling factor for the gradients + */ + void Update(int step, Param* param, float grad_scale=1.0f); + /** + * @param step training step + * @return the learning rate for this step + */ + float GetLearningRate(int step); + +SINGA provides several built-in updaters and learning rate change methods. +Users can configure them according to the UpdaterProto + + message UpdaterProto { + enum UpdaterType{ + // noraml SGD with momentum and weight decay + kSGD = 1; + // adaptive subgradient, http://www.magicbroom.info/Papers/DuchiHaSi10.pdf + kAdaGrad = 2; + // http://www.cs.toronto.edu/~tijmen/csc321/slides/lecture_slides_lec6.pdf + kRMSProp = 3; + // Nesterov first optimal gradient method + kNesterov = 4; + } + // updater type + required UpdaterType type = 1 [default=kSGD]; + // configuration for RMSProp algorithm + optional RMSPropProto rmsprop_conf = 50; + + enum ChangeMethod { + kFixed = 0; + kInverseT = 1; + kInverse = 2; + kExponential = 3; + kLinear = 4; + kStep = 5; + kFixedStep = 6; + } + // change method for learning rate + required ChangeMethod lr_change= 2 [default = kFixed]; + + optional FixedStepProto fixedstep_conf=40; + ... + optional float momentum = 31 [default = 0]; + optional float weight_decay = 32 [default = 0]; + // base learning rate + optional float base_lr = 34 [default = 0]; + } + + +## Other model configuration fields + +Some other important configuration fields for training a deep learning model is +listed: + + // model name, e.g., "cifar10-dcnn", "mnist-mlp" + string name; + // displaying training info for every this number of iterations, default is 0 + int32 display_freq; + // total num of steps/iterations for training + int32 train_steps; + // do test for every this number of training iterations, default is 0 + int32 test_freq; + // run test for this number of steps/iterations, default is 0. + // The test dataset has test_steps * batchsize instances. + int32 test_steps; + // do checkpoint for every this number of training steps, default is 0 + int32 checkpoint_freq; + +The pages of [checkpoint and restore](checkpoint.html) has details on checkpoint related fields. Added: incubator/singa/site/trunk/content/markdown/v0.2.0/neural-net.md URL: http://svn.apache.org/viewvc/incubator/singa/site/trunk/content/markdown/v0.2.0/neural-net.md?rev=1738695&view=auto ============================================================================== --- incubator/singa/site/trunk/content/markdown/v0.2.0/neural-net.md (added) +++ incubator/singa/site/trunk/content/markdown/v0.2.0/neural-net.md Tue Apr 12 06:22:20 2016 @@ -0,0 +1,327 @@ +# Neural Net + +--- + +`NeuralNet` in SINGA represents an instance of user's neural net model. As the +neural net typically consists of a set of layers, `NeuralNet` comprises +a set of unidirectionally connected [Layer](layer.html)s. +This page describes how to convert an user's neural net into +the configuration of `NeuralNet`. + +<img src="../images/model-category.png" align="center" width="200px"/> +<span><strong>Figure 1 - Categorization of popular deep learning models.</strong></span> + +## Net structure configuration + +Users configure the `NeuralNet` by listing all layers of the neural net and +specifying each layer's source layer names. Popular deep learning models can be +categorized as Figure 1. The subsequent sections give details for each +category. + +### Feed-forward models + +<div align = "left"> +<img src="../images/mlp-net.png" align="center" width="200px"/> +<span><strong>Figure 2 - Net structure of a MLP model.</strong></span> +</div> + +Feed-forward models, e.g., CNN and MLP, can easily get configured as their layer +connections are undirected without circles. The +configuration for the MLP model shown in Figure 1 is as follows, + + net { + layer { + name : 'data" + type : kData + } + layer { + name : 'image" + type : kImage + srclayer: 'data' + } + layer { + name : 'label" + type : kLabel + srclayer: 'data' + } + layer { + name : 'hidden" + type : kHidden + srclayer: 'image' + } + layer { + name : 'softmax" + type : kSoftmaxLoss + srclayer: 'hidden' + srclayer: 'label' + } + } + +### Energy models + +<img src="../images/rbm-rnn.png" align="center" width="500px"/> +<span><strong>Figure 3 - Convert connections in RBM and RNN.</strong></span> + + +For energy models including RBM, DBM, +etc., their connections are undirected (i.e., Category B). To represent these models using +`NeuralNet`, users can simply replace each connection with two directed +connections, as shown in Figure 3a. In other words, for each pair of connected layers, their source +layer field should include each other's name. +The full [RBM example](rbm.html) has +detailed neural net configuration for a RBM model, which looks like + + net { + layer { + name : "vis" + type : kVisLayer + param { + name : "w1" + } + srclayer: "hid" + } + layer { + name : "hid" + type : kHidLayer + param { + name : "w2" + share_from: "w1" + } + srclayer: "vis" + } + } + +### RNN models + +For recurrent neural networks (RNN), users can remove the recurrent connections +by unrolling the recurrent layer. For example, in Figure 3b, the original +layer is unrolled into a new layer with 4 internal layers. In this way, the +model is like a normal feed-forward model, thus can be configured similarly. +The [RNN example](rnn.html) has a full neural net +configuration for a RNN model. + + +## Configuration for multiple nets + +Typically, a training job includes three neural nets for +training, validation and test phase respectively. The three neural nets share most +layers except the data layer, loss layer or output layer, etc.. To avoid +redundant configurations for the shared layers, users can uses the `exclude` +filed to filter a layer in the neural net, e.g., the following layer will be +filtered when creating the testing `NeuralNet`. + + + layer { + ... + exclude : kTest # filter this layer for creating test net + } + + + +## Neural net partitioning + +A neural net can be partitioned in different ways to distribute the training +over multiple workers. + +### Batch and feature dimension + +<img src="../images/partition_fc.png" align="center" width="400px"/> +<span><strong>Figure 4 - Partitioning of a fully connected layer.</strong></span> + + +Every layer's feature blob is considered a matrix whose rows are feature +vectors. Thus, one layer can be split on two dimensions. Partitioning on +dimension 0 (also called batch dimension) slices the feature matrix by rows. +For instance, if the mini-batch size is 256 and the layer is partitioned into 2 +sub-layers, each sub-layer would have 128 feature vectors in its feature blob. +Partitioning on this dimension has no effect on the parameters, as every +[Param](param.html) object is replicated in the sub-layers. Partitioning on dimension +1 (also called feature dimension) slices the feature matrix by columns. For +example, suppose the original feature vector has 50 units, after partitioning +into 2 sub-layers, each sub-layer would have 25 units. This partitioning may +result in [Param](param.html) object being split, as shown in +Figure 4. Both the bias vector and weight matrix are +partitioned into two sub-layers. + + +### Partitioning configuration + +There are 4 partitioning schemes, whose configurations are give below, + + 1. Partitioning each singe layer into sub-layers on batch dimension (see + below). It is enabled by configuring the partition dimension of the layer to + 0, e.g., + + # with other fields omitted + layer { + partition_dim: 0 + } + + 2. Partitioning each singe layer into sub-layers on feature dimension (see + below). It is enabled by configuring the partition dimension of the layer to + 1, e.g., + + # with other fields omitted + layer { + partition_dim: 1 + } + + 3. Partitioning all layers into different subsets. It is enabled by + configuring the location ID of a layer, e.g., + + # with other fields omitted + layer { + location: 1 + } + layer { + location: 0 + } + + + 4. Hybrid partitioning of strategy 1, 2 and 3. The hybrid partitioning is + useful for large models. An example application is to implement the + [idea proposed by Alex](http://arxiv.org/abs/1404.5997). + Hybrid partitioning is configured like, + + # with other fields omitted + layer { + location: 1 + } + layer { + location: 0 + } + layer { + partition_dim: 0 + location: 0 + } + layer { + partition_dim: 1 + location: 0 + } + +Currently SINGA supports strategy-2 well. Other partitioning strategies are +are under test and will be released in later version. + +## Parameter sharing + +Parameters can be shared in two cases, + + * sharing parameters among layers via user configuration. For example, the + visible layer and hidden layer of a RBM shares the weight matrix, which is configured through + the `share_from` field as shown in the above RBM configuration. The + configurations must be the same (except name) for shared parameters. + + * due to neural net partitioning, some `Param` objects are replicated into + different workers, e.g., partitioning one layer on batch dimension. These + workers share parameter values. SINGA controls this kind of parameter + sharing automatically, users do not need to do any configuration. + + * the `NeuralNet` for training and testing (and validation) share most layers + , thus share `Param` values. + +If the shared `Param` instances resident in the same process (may in different +threads), they use the same chunk of memory space for their values. But they +would have different memory spaces for their gradients. In fact, their +gradients will be averaged by the stub or server. + +## Advanced user guide + +### Creation + + static NeuralNet* NeuralNet::Create(const NetProto& np, Phase phase, int num); + +The above function creates a `NeuralNet` for a given phase, and returns a +pointer to the `NeuralNet` instance. The phase is in {kTrain, +kValidation, kTest}. `num` is used for net partitioning which indicates the +number of partitions. Typically, a training job includes three neural nets for +training, validation and test phase respectively. The three neural nets share most +layers except the data layer, loss layer or output layer, etc.. The `Create` +function takes in the full net configuration including layers for training, +validation and test. It removes layers for phases other than the specified +phase based on the `exclude` field in +[layer configuration](layer.html): + + layer { + ... + exclude : kTest # filter this layer for creating test net + } + +The filtered net configuration is passed to the constructor of `NeuralNet`: + + NeuralNet::NeuralNet(NetProto netproto, int npartitions); + +The constructor creates a graph representing the net structure firstly in + + Graph* NeuralNet::CreateGraph(const NetProto& netproto, int npartitions); + +Next, it creates a layer for each node and connects layers if their nodes are +connected. + + void NeuralNet::CreateNetFromGraph(Graph* graph, int npartitions); + +Since the `NeuralNet` instance may be shared among multiple workers, the +`Create` function returns a pointer to the `NeuralNet` instance . + +### Parameter sharing + + `Param` sharing +is enabled by first sharing the Param configuration (in `NeuralNet::Create`) +to create two similar (e.g., the same shape) Param objects, and then calling +(in `NeuralNet::CreateNetFromGraph`), + + void Param::ShareFrom(const Param& from); + +It is also possible to share `Param`s of two nets, e.g., sharing parameters of +the training net and the test net, + + void NeuralNet:ShareParamsFrom(NeuralNet* other); + +It will call `Param::ShareFrom` for each Param object. + +### Access functions +`NeuralNet` provides a couple of access function to get the layers and params +of the net: + + const std::vector<Layer*>& layers() const; + const std::vector<Param*>& params() const ; + Layer* name2layer(string name) const; + Param* paramid2param(int id) const; + + +### Partitioning + + +#### Implementation + +SINGA partitions the neural net in `CreateGraph` function, which creates one +node for each (partitioned) layer. For example, if one layer's partition +dimension is 0 or 1, then it creates `npartition` nodes for it; if the +partition dimension is -1, a single node is created, i.e., no partitioning. +Each node is assigned a partition (or location) ID. If the original layer is +configured with a location ID, then the ID is assigned to each newly created node. +These nodes are connected according to the connections of the original layers. +Some connection layers will be added automatically. +For instance, if two connected sub-layers are located at two +different workers, then a pair of bridge layers is inserted to transfer the +feature (and gradient) blob between them. When two layers are partitioned on +different dimensions, a concatenation layer which concatenates feature rows (or +columns) and a slice layer which slices feature rows (or columns) would be +inserted. These connection layers help making the network communication and +synchronization transparent to the users. + +#### Dispatching partitions to workers + +Each (partitioned) layer is assigned a location ID, based on which it is dispatched to one +worker. Particularly, the pointer to the `NeuralNet` instance is passed +to every worker within the same group, but each worker only computes over the +layers that have the same partition (or location) ID as the worker's ID. When +every worker computes the gradients of the entire model parameters +(strategy-2), we refer to this process as data parallelism. When different +workers compute the gradients of different parameters (strategy-3 or +strategy-1), we call this process model parallelism. The hybrid partitioning +leads to hybrid parallelism where some workers compute the gradients of the +same subset of model parameters while other workers compute on different model +parameters. For example, to implement the hybrid parallelism in for the +[DCNN model](http://arxiv.org/abs/1404.5997), we set `partition_dim = 0` for +lower layers and `partition_dim = 1` for higher layers. + Added: incubator/singa/site/trunk/content/markdown/v0.2.0/neuralnet-partition.md URL: http://svn.apache.org/viewvc/incubator/singa/site/trunk/content/markdown/v0.2.0/neuralnet-partition.md?rev=1738695&view=auto ============================================================================== --- incubator/singa/site/trunk/content/markdown/v0.2.0/neuralnet-partition.md (added) +++ incubator/singa/site/trunk/content/markdown/v0.2.0/neuralnet-partition.md Tue Apr 12 06:22:20 2016 @@ -0,0 +1,54 @@ +# Neural Net Partition + +--- + +The purposes of partitioning neural network is to distribute the partitions onto +different working units (e.g., threads or nodes, called workers in this article) +and parallelize the processing. +Another reason for partition is to handle large neural network which cannot be +hold in a single node. For instance, to train models against images with high +resolution we need large neural networks (in terms of training parameters). + +Since *Layer* is the first class citizen in SIGNA, we do the partition against +layers. Specifically, we support partitions at two levels. First, users can configure +the location (i.e., worker ID) of each layer. In this way, users assign one worker +for each layer. Secondly, for one layer, we can partition its neurons or partition +the instances (e.g, images). They are called layer partition and data partition +respectively. We illustrate the two types of partitions using an simple convolutional neural network. + +<img src="../images/conv-mnist.png" style="width: 220px"/> + +The above figure shows a convolutional neural network without any partition. It +has 8 layers in total (one rectangular represents one layer). The first layer is +DataLayer (data) which reads data from local disk files/databases (or HDFS). The second layer +is a MnistLayer which parses the records from MNIST data to get the pixels of a batch +of 8 images (each image is of size 28x28). The LabelLayer (label) parses the records to get the label +of each image in the batch. The ConvolutionalLayer (conv1) transforms the input image to the +shape of 8x27x27. The ReLULayer (relu1) conducts elementwise transformations. The PoolingLayer (pool1) +sub-samples the images. The fc1 layer is fully connected with pool1 layer. It +mulitplies each image with a weight matrix to generate a 10 dimension hidden feature which +is then normalized by a SoftmaxLossLayer to get the prediction. + +<img src="../images/conv-mnist-datap.png" style="width: 1000px"/> + +The above figure shows the convolutional neural network after partitioning all layers +except the DataLayer and ParserLayers, into 3 partitions using data partition. +The read layers process 4 images of the batch, the black and blue layers process 2 images +respectively. Some helper layers, i.e., SliceLayer, ConcateLayer, BridgeSrcLayer, +BridgeDstLayer and SplitLayer, are added automatically by our partition algorithm. +Layers of the same color resident in the same worker. There would be data transferring +across different workers at the boundary layers (i.e., BridgeSrcLayer and BridgeDstLayer), +e.g., between s-slice-mnist-conv1 and d-slice-mnist-conv1. + +<img src="../images/conv-mnist-layerp.png" style="width: 1000px"/> + +The above figure shows the convolutional neural network after partitioning all layers +except the DataLayer and ParserLayers, into 2 partitions using layer partition. We can +see that each layer processes all 8 images from the batch. But different partitions process +different part of one image. For instance, the layer conv1-00 process only 4 channels. The other +4 channels are processed by conv1-01 which residents in another worker. + + +Since the partition is done at the layer level, we can apply different partitions for +different layers to get a hybrid partition for the whole neural network. Moreover, +we can also specify the layer locations to locate different layers to different workers. Added: incubator/singa/site/trunk/content/markdown/v0.2.0/overview.md URL: http://svn.apache.org/viewvc/incubator/singa/site/trunk/content/markdown/v0.2.0/overview.md?rev=1738695&view=auto ============================================================================== --- incubator/singa/site/trunk/content/markdown/v0.2.0/overview.md (added) +++ incubator/singa/site/trunk/content/markdown/v0.2.0/overview.md Tue Apr 12 06:22:20 2016 @@ -0,0 +1,93 @@ +# Introduction + +--- + +SINGA is a general distributed deep learning platform for training big deep +learning models over large datasets. It is designed with an intuitive +programming model based on the layer abstraction. A variety +of popular deep learning models are supported, namely feed-forward models including +convolutional neural networks (CNN), energy models like restricted Boltzmann +machine (RBM), and recurrent neural networks (RNN). Many built-in layers are +provided for users. SINGA architecture is +sufficiently flexible to run synchronous, asynchronous and hybrid training +frameworks. SINGA +also supports different neural net partitioning schemes to parallelize the +training of large models, namely partitioning on batch dimension, feature +dimension or hybrid partitioning. + + +## Goals + +As a distributed system, the first goal of SINGA is to have good scalability. In other +words, SINGA is expected to reduce the total training time to achieve certain +accuracy with more computing resources (i.e., machines). + + +The second goal is to make SINGA easy to use. +It is non-trivial for programmers to develop and train models with deep and +complex model structures. Distributed training further increases the burden of +programmers, e.g., data and model partitioning, and network communication. Hence it is essential to +provide an easy to use programming model so that users can implement their deep +learning models/algorithms without much awareness of the underlying distributed +platform. + +## Principles + +Scalability is a challenging research problem for distributed deep learning +training. SINGA provides a general architecture to exploit the scalability of +different training frameworks. Synchronous training frameworks improve the +efficiency of one training iteration, and +asynchronous training frameworks improve the convergence rate. Given a fixed budget +(e.g., cluster size), users can run a hybrid framework that maximizes the +scalability by trading off between efficiency and convergence rate. + +SINGA comes with a programming model designed based on the layer abstraction, which +is intuitive for deep learning models. A variety of +popular deep learning models can be expressed and trained using this programming model. + +## System overview + +<img src="../images/sgd.png" align="center" width="400px"/> +<span><strong>Figure 1 - SGD flow.</strong></span> + +Training a deep learning model is to find the optimal parameters involved in +the transformation functions that generate good features for specific tasks. +The goodness of a set of parameters is measured by a loss function, e.g., +[Cross-Entropy Loss](https://en.wikipedia.org/wiki/Cross_entropy). Since the +loss functions are usually non-linear and non-convex, it is difficult to get a +closed form solution. Typically, people use the stochastic gradient descent +(SGD) algorithm, which randomly +initializes the parameters and then iteratively updates them to reduce the loss +as shown in Figure 1. + +<img src="../images/overview.png" align="center" width="400px"/> +<span><strong>Figure 2 - SINGA overview.</strong></span> + +SGD is used in SINGA to train +parameters of deep learning models. The training workload is distributed over +worker and server units as shown in Figure 2. In each +iteration, every worker calls *TrainOneBatch* function to compute +parameter gradients. *TrainOneBatch* takes a *NeuralNet* object +representing the neural net, and visits layers of the *NeuralNet* in +certain order. The resultant gradients are sent to the local stub that +aggregates the requests and forwards them to corresponding servers for +updating. Servers reply to workers with the updated parameters for the next +iteration. + + +## Job submission + +To submit a job in SINGA (i.e., training a deep learning model), +users pass the job configuration to SINGA driver in the +[main function](programming-guide.html). The job configuration +specifies the four major components in Figure 2, + + * a [NeuralNet](neural-net.html) describing the neural net structure with the detailed layer setting and their connections; + * a [TrainOneBatch](train-one-batch.html) algorithm which is tailored for different model categories; + * an [Updater](updater.html) defining the protocol for updating parameters at the server side; + * a [Cluster Topology](distributed-training.html) specifying the distributed architecture of workers and servers. + +This process is like the job submission in Hadoop, where users configure their +jobs in the main function to set the mapper, reducer, etc. +In Hadoop, users can configure their jobs with their own (or built-in) mapper and reducer; in SINGA, users +can configure their jobs with their own (or built-in) layer, updater, etc. Added: incubator/singa/site/trunk/content/markdown/v0.2.0/param.md URL: http://svn.apache.org/viewvc/incubator/singa/site/trunk/content/markdown/v0.2.0/param.md?rev=1738695&view=auto ============================================================================== --- incubator/singa/site/trunk/content/markdown/v0.2.0/param.md (added) +++ incubator/singa/site/trunk/content/markdown/v0.2.0/param.md Tue Apr 12 06:22:20 2016 @@ -0,0 +1,226 @@ +# Parameters + +--- + +A `Param` object in SINGA represents a set of parameters, e.g., a weight matrix +or a bias vector. *Basic user guide* describes how to configure for a `Param` +object, and *Advanced user guide* provides details on implementing users' +parameter initialization methods. + +## Basic user guide + +The configuration of a Param object is inside a layer configuration, as the +`Param` are associated with layers. An example configuration is like + + layer { + ... + param { + name : "p1" + init { + type : kConstant + value: 1 + } + } + } + +The [SGD algorithm](overview.html) starts with initializing all +parameters according to user specified initialization method (the `init` field). +For the above example, +all parameters in `Param` "p1" will be initialized to constant value 1. The +configuration fields of a Param object is defined in [ParamProto](../api/classsinga_1_1ParamProto.html): + + * name, an identifier string. It is an optional field. If not provided, SINGA + will generate one based on layer name and its order in the layer. + * init, field for setting initialization methods. + * share_from, name of another `Param` object, from which this `Param` will share + configurations and values. + * lr_scale, float value to be multiplied with the learning rate when + [updating the parameters](updater.html) + * wd_scale, float value to be multiplied with the weight decay when + [updating the parameters](updater.html) + +There are some other fields that are specific to initialization methods. + +### Initialization methods + +Users can set the `type` of `init` use the following built-in initialization +methods, + + * `kConst`, set all parameters of the Param object to a constant value + + type: kConst + value: float # default is 1 + + * `kGaussian`, initialize the parameters following a Gaussian distribution. + + type: kGaussian + mean: float # mean of the Gaussian distribution, default is 0 + std: float # standard variance, default is 1 + value: float # default 0 + + * `kUniform`, initialize the parameters following an uniform distribution + + type: kUniform + low: float # lower boundary, default is -1 + high: float # upper boundary, default is 1 + value: float # default 0 + + * `kGaussianSqrtFanIn`, initialize `Param` objects with two dimensions (i.e., + matrix) using `kGaussian` and then + multiple each parameter with `1/sqrt(fan_in)`, where`fan_in` is the number of + columns of the matrix. + + * `kUniformSqrtFanIn`, the same as `kGaussianSqrtFanIn` except that the + distribution is an uniform distribution. + + * `kUniformFanInOut`, initialize matrix `Param` objects using `kUniform` and then + multiple each parameter with `sqrt(6/(fan_in + fan_out))`, where`fan_in + + fan_out` sums up the number of columns and rows of the matrix. + +For all above initialization methods except `kConst`, if their `value` is not +1, every parameter will be multiplied with `value`. Users can also implement +their own initialization method following the *Advanced user guide*. + + +## Advanced user guide + +This sections describes the details on implementing new parameter +initialization methods. + +### Base ParamGenerator +All initialization methods are implemented as +subclasses of the base `ParamGenerator` class. + + class ParamGenerator { + public: + virtual void Init(const ParamGenProto&); + void Fill(Param*); + + protected: + ParamGenProto proto_; + }; + +Configurations of the initialization method is in `ParamGenProto`. The `Fill` +function fills the `Param` object (passed in as an argument). + +### New ParamGenerator subclass + +Similar to implement a new Layer subclass, users can define a configuration +protocol message, + + # in user.proto + message FooParamProto { + optional int32 x = 1; + } + extend ParamGenProto { + optional FooParamProto fooparam_conf =101; + } + +The configuration of `Param` would be + + param { + ... + init { + user_type: 'FooParam" # must use user_type for user defined methods + [fooparam_conf] { # must use brackets for configuring user defined messages + x: 10 + } + } + } + +The subclass could be declared as, + + class FooParamGen : public ParamGenerator { + public: + void Fill(Param*) override; + }; + +Users can access the configuration fields in `Fill` by + + int x = proto_.GetExtension(fooparam_conf).x(); + +To use the new initialization method, users need to register it in the +[main function](programming-guide.html). + + driver.RegisterParamGenerator<FooParamGen>("FooParam") # must be consistent with the user_type in configuration + +{% comment %} +### Base Param class + +### Members + + int local_version_; + int slice_start_; + vector<int> slice_offset_, slice_size_; + + shared_ptr<Blob<float>> data_; + Blob<float> grad_; + ParamProto proto_; + +Each Param object has a local version and a global version (inside the data +Blob). These two versions are used for synchronization. If multiple Param +objects share the same values, they would have the same `data_` field. +Consequently, their global version is the same. The global version is updated +by [the stub thread](communication.html). The local version is +updated in `Worker::Update` function which assigns the global version to the +local version. The `Worker::Collect` function is blocked until the global +version is larger than the local version, i.e., when `data_` is updated. In +this way, we synchronize workers sharing parameters. + +In Deep learning models, some Param objects are 100 times larger than others. +To ensure the load-balance among servers, SINGA slices large Param objects. The +slicing information is recorded by `slice_*`. Each slice is assigned a unique +ID starting from 0. `slice_start_` is the ID of the first slice of this Param +object. `slice_offset_[i]` is the offset of the i-th slice in this Param +object. `slice_size_[i]` is the size of the i-th slice. These slice information +is used to create messages for transferring parameter values or gradients to +different servers. + +Each Param object has a `grad_` field for gradients. Param objects do not share +this Blob although they may share `data_`. Because each layer containing a +Param object would contribute gradients. E.g., in RNN, the recurrent layers +share parameters values, and the gradients used for updating are averaged from all recurrent +these recurrent layers. In SINGA, the stub thread will aggregate local +gradients for the same Param object. The server will do a global aggregation +of gradients for the same Param object. + +The `proto_` field has some meta information, e.g., name and ID. It also has a +field called `owner` which is the ID of the Param object that shares parameter +values with others. + +### Functions +The base Param class implements two sets of functions, + + virtual void InitValues(int version = 0); // initialize values according to `init_method` + void ShareFrom(const Param& other); // share `data_` from `other` Param + -------------- + virtual Msg* GenGetMsg(bool copy, int slice_idx); + virtual Msg* GenPutMsg(bool copy, int slice_idx); + ... // other message related functions. + +Besides the functions for processing the parameter values, there is a set of +functions for generating and parsing messages. These messages are for +transferring parameter values or gradients between workers and servers. Each +message corresponds to one Param slice. If `copy` is false, it means the +receiver of this message is in the same process as the sender. In such case, +only pointers to the memory of parameter value (or gradient) are wrapped in +the message; otherwise, the parameter values (or gradients) should be copied +into the message. + + +## Implementing Param subclass +Users can extend the base Param class to implement their own parameter +initialization methods and message transferring protocols. Similar to +implementing a new Layer subclasses, users can create google protocol buffer +messages for configuring the Param subclass. The subclass, denoted as FooParam +should be registered in main.cc, + + dirver.RegisterParam<FooParam>(kFooParam); // kFooParam should be different to 0, which is for the base Param type + + + * type, an integer representing the `Param` type. Currently SINGA provides one + `Param` implementation with type 0 (the default type). If users want + to use their own Param implementation, they should extend the base Param + class and configure this field with `kUserParam` + +{% endcomment %} Added: incubator/singa/site/trunk/content/markdown/v0.2.0/programming-guide.md URL: http://svn.apache.org/viewvc/incubator/singa/site/trunk/content/markdown/v0.2.0/programming-guide.md?rev=1738695&view=auto ============================================================================== --- incubator/singa/site/trunk/content/markdown/v0.2.0/programming-guide.md (added) +++ incubator/singa/site/trunk/content/markdown/v0.2.0/programming-guide.md Tue Apr 12 06:22:20 2016 @@ -0,0 +1,95 @@ +# Programming Guide + +--- + +To submit a training job, users must provide the configuration of the +four components shown in Figure 1: + + * a [NeuralNet](neural-net.html) describing the neural net structure with the detailed layer setting and their connections; + * a [TrainOneBatch](train-one-batch.html) algorithm which is tailored for different model categories; + * an [Updater](updater.html) defining the protocol for updating parameters at the server side; + * a [Cluster Topology](distributed-training.html) specifying the distributed architecture of workers and servers. + +The *Basic user guide* section describes how to submit a training job using +built-in components; while the *Advanced user guide* section presents details +on writing user's own main function to register components implemented by +themselves. In addition, the training data must be prepared, which has the same +[process](data.html) for both advanced users and basic users. + +<img src="../images/overview.png" align="center" width="400px"/> +<span><strong>Figure 1 - SINGA overview.</strong></span> + + + +## Basic user guide + +Users can use the default main function provided SINGA to submit the training +job. For this case, a job configuration file written as a google protocol +buffer message for the [JobProto](../api/classsinga_1_1JobProto.html) must be provided in the command line, + + ./bin/singa-run.sh -conf <path to job conf> [-resume] + +`-resume` is for continuing the training from last +[checkpoint](checkpoint.html). +The [MLP](mlp.html) and [CNN](cnn.html) +examples use built-in components. Please read the corresponding pages for their +job configuration files. The subsequent pages will illustrate the details on +each component of the configuration. + +## Advanced user guide + +If a user's model contains some user-defined components, e.g., +[Updater](updater.html), he has to write a main function to +register these components. It is similar to Hadoop's main function. Generally, +the main function should + + * initialize SINGA, e.g., setup logging. + + * register user-defined components. + + * create and pass the job configuration to SINGA driver + + +An example main function is like + + #include "singa.h" + #include "user.h" // header for user code + + int main(int argc, char** argv) { + singa::Driver driver; + driver.Init(argc, argv); + bool resume; + // parse resume option from argv. + + // register user defined layers + driver.RegisterLayer<FooLayer>(kFooLayer); + // register user defined updater + driver.RegisterUpdater<FooUpdater>(kFooUpdater); + ... + auto jobConf = driver.job_conf(); + // update jobConf + + driver.Train(resume, jobConf); + return 0; + } + +The Driver class' `Init` method will load a job configuration file provided by +users as a command line argument (`-conf <job conf>`). It contains at least the +cluster topology and returns the `jobConf` for users to update or fill in +configurations of neural net, updater, etc. If users define subclasses of +Layer, Updater, Worker and Param, they should register them through the driver. +Finally, the job configuration is submitted to the driver which starts the +training. + +We will provide helper functions to make the configuration easier in the +future, like [keras](https://github.com/fchollet/keras). + +Users need to compile and link their code (e.g., layer implementations and the main +file) with SINGA library (*.libs/libsinga.so*) to generate an +executable file, e.g., with name *mysinga*. To launch the program, users just pass the +path of the *mysinga* and base job configuration to *./bin/singa-run.sh*. + + ./bin/singa-run.sh -conf <path to job conf> -exec <path to mysinga> [other arguments] + +The [RNN application](rnn.html) provides a full example of +implementing the main function for training a specific RNN model. Added: incubator/singa/site/trunk/content/markdown/v0.2.0/python.md URL: http://svn.apache.org/viewvc/incubator/singa/site/trunk/content/markdown/v0.2.0/python.md?rev=1738695&view=auto ============================================================================== --- incubator/singa/site/trunk/content/markdown/v0.2.0/python.md (added) +++ incubator/singa/site/trunk/content/markdown/v0.2.0/python.md Tue Apr 12 06:22:20 2016 @@ -0,0 +1,374 @@ +# Python Binding + +--- + +Python binding provides APIs for configuring a training job following +[keras](http://keras.io/), including the configuration of neural net, training +algorithm, etc. It replaces the configuration file (e.g., *job.conf*) in +protobuf format, which is typically long and error-prone to prepare. In later +version, we will add python functions to interact with the layer and neural net +objects, which would enable users to train and debug their models +interactively. + +Here is the layout of python related code, + + SINGAROOT/tool/python + |-- pb2 (has job_pb2.py) + |-- singa + |-- model.py + |-- layer.py + |-- parameter.py + |-- initialization.py + |-- utils + |-- utility.py + |-- message.py + |-- examples + |-- cifar10_cnn.py, mnist_mlp.py, , mnist_rbm1.py, mnist_ae.py, etc. + |-- datasets + |-- cifar10.py + |-- mnist.py + +## Compiling and running instructions + +In order to use the Python APIs, users need to add the following arguments when compiling +SINGA, + + ./configure --enable-python --with-python=PYTHON_DIR + make + +where PYTHON_DIR has Python.h + + +The training program is launched by + + bin/singa-run.sh -exec <user_main.py> + +where user_main.py creates the JobProto object and passes it to Driver::Train to +start the training. + +For example, + + cd SINGAROOT + bin/singa-run.sh -exec tool/python/examples/cifar10_cnn.py + + + +## Examples + + +### MLP Example + +This example uses python APIs to configure and train a MLP model over the MNIST +dataset. The configuration content is the same as that written in *SINGAROOT/examples/mnist/job.conf*. + +``` +X_train, X_test, workspace = mnist.load_data() + +m = Sequential('mlp', sys.argv) + +m.add(Dense(2500, init='uniform', activation='tanh')) +m.add(Dense(2000, init='uniform', activation='tanh')) +m.add(Dense(1500, init='uniform', activation='tanh')) +m.add(Dense(1000, init='uniform', activation='tanh')) +m.add(Dense(500, init='uniform', activation='tanh')) +m.add(Dense(10, init='uniform', activation='softmax')) + +sgd = SGD(lr=0.001, lr_type='step') +topo = Cluster(workspace) +m.compile(loss='categorical_crossentropy', optimizer=sgd, cluster=topo) +m.fit(X_train, nb_epoch=1000, with_test=True) +result = m.evaluate(X_test, batch_size=100, test_steps=10, test_freq=60) +``` + +### CNN Example + +This example uses python APIs to configure and train a CNN model over the Cifar10 +dataset. The configuration content is the same as that written in *SINGAROOT/examples/cifar10/job.conf*. + + +``` +X_train, X_test, workspace = cifar10.load_data() + +m = Sequential('cnn', sys.argv) + +m.add(Convolution2D(32, 5, 1, 2, w_std=0.0001, b_lr=2)) +m.add(MaxPooling2D(pool_size=(3,3), stride=2)) +m.add(Activation('relu')) +m.add(LRN2D(3, alpha=0.00005, beta=0.75)) + +m.add(Convolution2D(32, 5, 1, 2, b_lr=2)) +m.add(Activation('relu')) +m.add(AvgPooling2D(pool_size=(3,3), stride=2)) +m.add(LRN2D(3, alpha=0.00005, beta=0.75)) + +m.add(Convolution2D(64, 5, 1, 2)) +m.add(Activation('relu')) +m.add(AvgPooling2D(pool_size=(3,3), stride=2)) + +m.add(Dense(10, w_wd=250, b_lr=2, b_wd=0, activation='softmax')) + +sgd = SGD(decay=0.004, lr_type='manual', step=(0,60000,65000), step_lr=(0.001,0.0001,0.00001)) +topo = Cluster(workspace) +m.compile(updater=sgd, cluster=topo) +m.fit(X_train, nb_epoch=1000, with_test=True) +result = m.evaluate(X_test, 1000, test_steps=30, test_freq=300) +``` + + +### RBM Example + +This example uses python APIs to configure and train a RBM model over the MNIST +dataset. The configuration content is the same as that written in *SINGAROOT/examples/rbm*.conf*. + +``` +rbmid = 3 +X_train, X_test, workspace = mnist.load_data(nb_rbm=rbmid) +m = Energy('rbm'+str(rbmid), sys.argv) + +out_dim = [1000, 500, 250] +m.add(RBM(out_dim, w_std=0.1, b_wd=0)) + +sgd = SGD(lr=0.1, decay=0.0002, momentum=0.8) +topo = Cluster(workspace) +m.compile(optimizer=sgd, cluster=topo) +m.fit(X_train, alg='cd', nb_epoch=6000) +``` + +### AutoEncoder Example +This example uses python APIs to configure and train an autoencoder model over +the MNIST dataset. The configuration content is the same as that written in +*SINGAROOT/examples/autoencoder.conf*. + + +``` +rbmid = 4 +X_train, X_test, workspace = mnist.load_data(nb_rbm=rbmid+1) +m = Sequential('autoencoder', sys.argv) + +hid_dim = [1000, 500, 250, 30] +m.add(Autoencoder(hid_dim, out_dim=784, activation='sigmoid', param_share=True)) + +agd = AdaGrad(lr=0.01) +topo = Cluster(workspace) +m.compile(loss='mean_squared_error', optimizer=agd, cluster=topo) +m.fit(X_train, alg='bp', nb_epoch=12200) +``` + +### To run SINGA on GPU + +Users need to set a list of gpu ids to `device` field in fit() or evaluate(). +The number of GPUs must be the same to the number of workers configured for +cluster topology. + + +``` +gpu_id = [0] +m.fit(X_train, nb_epoch=100, with_test=True, device=gpu_id) +``` + +### TIPS + +Hidden layers for MLP can be configured as + +``` +for n in [2500, 2000, 1500, 1000, 500]: + m.add(Dense(n, init='uniform', activation='tanh')) +m.add(Dense(10, init='uniform', activation='softmax')) +``` + +Activation layer can be specified separately + +``` +m.add(Dense(2500, init='uniform')) +m.add(Activation('tanh')) +``` + +Users can explicitly specify hyper-parameters of weight and bias + +``` +par = Parameter(init='uniform', scale=0.05) +m.add(Dense(2500, w_param=par, b_param=par, activation='tanh')) +m.add(Dense(2000, w_param=par, b_param=par, activation='tanh')) +m.add(Dense(1500, w_param=par, b_param=par, activation='tanh')) +m.add(Dense(1000, w_param=par, b_param=par, activation='tanh')) +m.add(Dense(500, w_param=par, b_param=par, activation='tanh')) +m.add(Dense(10, w_param=par, b_param=par, activation='softmax')) +``` + + +``` +parw = Parameter(init='gauss', std=0.0001) +parb = Parameter(init='const', value=0) +m.add(Convolution(32, 5, 1, 2, w_param=parw, b_param=parb, b_lr=2)) +m.add(MaxPooling2D(pool_size(3,3), stride=2)) +m.add(Activation('relu')) +m.add(LRN2D(3, alpha=0.00005, beta=0.75)) + +parw.update(std=0.01) +m.add(Convolution(32, 5, 1, 2, w_param=parw, b_param=parb)) +m.add(Activation('relu')) +m.add(AvgPooling2D(pool_size(3,3), stride=2)) +m.add(LRN2D(3, alpha=0.00005, beta=0.75)) + +m.add(Convolution(64, 5, 1, 2, w_param=parw, b_param=parb, b_lr=1)) +m.add(Activation('relu')) +m.add(AvgPooling2D(pool_size(3,3), stride=2)) + +m.add(Dense(10, w_param=parw, w_wd=250, b_param=parb, b_lr=2, b_wd=0, activation='softmax')) +``` + + +Data can be added in this way, + +``` +X_train, X_test = mnist.load_data() // parameter values are set in load_data() +m.fit(X_train, ...) // Data layer for training is added +m.evaluate(X_test, ...) // Data layer for testing is added +``` +or this way, + +``` +X_train, X_test = mnist.load_data() // parameter values are set in load_data() +m.add(X_train) // explicitly add Data layer +m.add(X_test) // explicitly add Data layer +``` + + +``` +store = Store(path='train.bin', batch_size=64, ...) // parameter values are set explicitly +m.add(Data(load='recordinput', phase='train', conf=store)) // Data layer is added +store = Store(path='test.bin', batch_size=100, ...) // parameter values are set explicitly +m.add(Data(load='recordinput', phase='test', conf=store)) // Data layer is added +``` + + +### Cases to run SINGA + +(1) Run SINGA for training + +``` +m.fit(X_train, nb_epoch=1000) +``` + +(2) Run SINGA for training and validation + +``` +m.fit(X_train, validate_data=X_valid, nb_epoch=1000) +``` + +(3) Run SINGA for test while training + +``` +m.fit(X_train, nb_epoch=1000, with_test=True) +result = m.evaluate(X_test, batch_size=100, test_steps=100) +``` + +(4) Run SINGA for test only +Assume a checkpoint exists after training + +``` +result = m.evaluate(X_test, batch_size=100, checkpoint_path=workspace+'/checkpoint/step100-worker0') +``` + + +## Implementation Details + +### Layer class (inherited) + +* Data +* Dense +* Activation +* Convolution2D +* MaxPooling2D +* AvgPooling2D +* LRN2D +* Dropout +* RBM +* Autoencoder + +### Model class + +Model class has `jobconf` (JobProto) and `layers` (layer list) + +Methods in Model class + +* add + * add Layer into Model + * 2 subclasses: Sequential model and Energy model + +* compile + * set Updater (i.e., optimizer) and Cluster (i.e., topology) components + +* fit + * set Training data and parameter values for the training + * (optional) set Validatiaon data and parameter values + * set Train_one_batch component + * specify `with_test` field if a user wants to run SINGA with test data simultaneously. + * [TODO] recieve train/validation results, e.g., accuracy, loss, ppl, etc. + +* evaluate + * set Testing data and parameter values for the testing + * specify `checkpoint_path` field if a user want to run SINGA only for testing. + * [TODO] recieve test results, e.g., accuracy, loss, ppl, etc. + +### Results + +fit() and evaluate() return train/test results, a dictionary containing + +* [key]: step number +* [value]: a list of dictionay + * 'acc' for accuracy + * 'loss' for loss + * 'ppl' for ppl + * 'se' for squred error + + +### Parameter class + +Users need to set parameter and initial values. For example, + +* Parameter (fields in Param proto) + * lr = (float) // learning rate multiplier, used to scale the learning rate when updating parameters. + * wd = (float) // weight decay multiplier, used to scale the weight decay when updating parameters. + +* Parameter initialization (fields in ParamGen proto) + * init = (string) // one of the types, 'uniform', 'constant', 'gaussian' + * high = (float) // for 'uniform' + * low = (float) // for 'uniform' + * value = (float) // for 'constant' + * mean = (float) // for 'gaussian' + * std = (float) // for 'gaussian' + +* Weight (`w_param`) is 'gaussian' with mean=0, std=0.01 at default + +* Bias (`b_param`) is 'constant' with value=0 at default + +* How to update the parameter fields + * for updating Weight, put `w_` in front of field name + * for updating Bias, put `b_` in front of field name + +Several ways to set Parameter values + +``` +parw = Parameter(lr=2, wd=10, init='gaussian', std=0.1) +parb = Parameter(lr=1, wd=0, init='constant', value=0) +m.add(Convolution2D(10, w_param=parw, b_param=parb, ...) +``` + +``` +m.add(Dense(10, w_mean=1, w_std=0.1, w_lr=2, w_wd=10, ...) +``` + +``` +parw = Parameter(init='constant', mean=0) +m.add(Dense(10, w_param=parw, w_lr=1, w_wd=1, b_value=1, ...) +``` + +### Other classes + +* Store +* Algorithm +* Updater +* SGD +* AdaGrad +* Cluster Added: incubator/singa/site/trunk/content/markdown/v0.2.0/quick-start.md URL: http://svn.apache.org/viewvc/incubator/singa/site/trunk/content/markdown/v0.2.0/quick-start.md?rev=1738695&view=auto ============================================================================== --- incubator/singa/site/trunk/content/markdown/v0.2.0/quick-start.md (added) +++ incubator/singa/site/trunk/content/markdown/v0.2.0/quick-start.md Tue Apr 12 06:22:20 2016 @@ -0,0 +1,187 @@ +# Quick Start + +--- + +## SINGA setup + +Please refer to the +[installation](installation.html) page +for guidance on installing SINGA. + +### Starting Zookeeper + +SINGA uses [zookeeper](https://zookeeper.apache.org/) to coordinate the +training. Please make sure the zookeeper service is started before running +SINGA. + +If you installed the zookeeper using our thirdparty script, you can +simply start it by: + + #goto top level folder + cd SINGA_ROOT + ./bin/zk-service.sh start + +(`./bin/zk-service.sh stop` stops the zookeeper). + +Otherwise, if you launched a zookeeper by yourself but not used the +default port, please edit the `conf/singa.conf`: + + zookeeper_host: "localhost:YOUR_PORT" + +## Running in standalone mode + +Running SINGA in standalone mode is on the contrary of running it using cluster +managers like [Mesos](http://mesos.apache.org/) or [YARN](http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/YARN.html). + +### Training on a single node + +For single node training, one process will be launched to run SINGA at +local host. We train the [CNN model](http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks) over the +[CIFAR-10](http://www.cs.toronto.edu/~kriz/cifar.html) dataset as an example. +The hyper-parameters are set following +[cuda-convnet](https://code.google.com/p/cuda-convnet/). More details is +available at [CNN example](cnn.html). + + +#### Preparing data and job configuration + +Download the dataset and create the data shards for training and testing. + + cd examples/cifar10/ + cp Makefile.example Makefile + make download + make create + +A training dataset and a test dataset are created under *cifar10-train-shard* +and *cifar10-test-shard* folder respectively. An *image_mean.bin* file is also +generated, which contains the feature mean of all images. + +Since all code used for training this CNN model is provided by SINGA as +built-in implementation, there is no need to write any code. Instead, users just +execute the running script (*../../bin/singa-run.sh*) by providing the job +configuration file (*job.conf*). To code in SINGA, please refer to the +[programming guide](programming-guide.html). + +#### Training without parallelism + +By default, the cluster topology has a single worker and a single server. +In other words, neither the training data nor the neural net is partitioned. + +The training is started by running: + + # goto top level folder + cd ../../ + ./bin/singa-run.sh -conf examples/cifar10/job.conf + + +You can list the current running jobs by, + + ./bin/singa-console.sh list + + JOB ID |NUM PROCS + ----------|----------- + 24 |1 + +Jobs can be killed by, + + ./bin/singa-console.sh kill JOB_ID + + +Logs and job information are available in */tmp/singa-log* folder, which can be +changed to other folders by setting `log-dir` in *conf/singa.conf*. + + +#### Asynchronous parallel training + + # job.conf + ... + cluster { + nworker_groups: 2 + nworkers_per_procs: 2 + workspace: "examples/cifar10/" + } + +In SINGA, [asynchronous training](architecture.html) is enabled by launching +multiple worker groups. For example, we can change the original *job.conf* to +have two worker groups as shown above. By default, each worker group has one +worker. Since one process is set to contain two workers. The two worker groups +will run in the same process. Consequently, they run the in-memory +[Downpour](frameworks.html) training framework. Users do not need to split the +dataset explicitly for each worker (group); instead, they can assign each +worker (group) a random offset to the start of the dataset. The workers would +run as on different data partitions. + + # job.conf + ... + neuralnet { + layer { + ... + sharddata_conf { + random_skip: 5000 + } + } + ... + } + +The running command is: + + ./bin/singa-run.sh -conf examples/cifar10/job.conf + +#### Synchronous parallel training + + # job.conf + ... + cluster { + nworkers_per_group: 2 + nworkers_per_procs: 2 + workspace: "examples/cifar10/" + } + +In SINGA, [asynchronous training](architecture.html) is enabled +by launching multiple workers within one worker group. For instance, we can +change the original *job.conf* to have two workers in one worker group as shown +above. The workers will run synchronously +as they are from the same worker group. This framework is the in-memory +[sandblaster](frameworks.html). +The model is partitioned among the two workers. In specific, each layer is +sliced over the two workers. The sliced layer +is the same as the original layer except that it only has `B/g` feature +instances, where `B` is the number of instances in a mini-batch, `g` is the number of +workers in a group. It is also possible to partition the layer (or neural net) +using [other schemes](neural-net.html). +All other settings are the same as running without partitioning + + ./bin/singa-run.sh -conf examples/cifar10/job.conf + +### Training in a cluster + +We can extend the above two training frameworks to a cluster by updating the +cluster configuration with: + + nworker_per_procs: 1 + +Every process would then create only one worker thread. Consequently, the workers +would be created in different processes (i.e., nodes). The *hostfile* +must be provided under *SINGA_ROOT/conf/* specifying the nodes in the cluster, +e.g., + + logbase-a01 + logbase-a02 + +And the zookeeper location must be configured correctly, e.g., + + #conf/singa.conf + zookeeper_host: "logbase-a01" + +The running command is the same as for single node training: + + ./bin/singa-run.sh -conf examples/cifar10/job.conf + +## Running with Mesos + +*working*... + +## Where to go next + +The [programming guide](programming-guide.html) pages will +describe how to submit a training job in SINGA. Added: incubator/singa/site/trunk/content/markdown/v0.2.0/rbm.md URL: http://svn.apache.org/viewvc/incubator/singa/site/trunk/content/markdown/v0.2.0/rbm.md?rev=1738695&view=auto ============================================================================== --- incubator/singa/site/trunk/content/markdown/v0.2.0/rbm.md (added) +++ incubator/singa/site/trunk/content/markdown/v0.2.0/rbm.md Tue Apr 12 06:22:20 2016 @@ -0,0 +1,365 @@ +# RBM Example + +--- + +This example uses SINGA to train 4 RBM models and one auto-encoder model over the +[MNIST dataset](http://yann.lecun.com/exdb/mnist/). The auto-encoder model is trained +to reduce the dimensionality of the MNIST image feature. The RBM models are trained +to initialize parameters of the auto-encoder model. This example application is +from [Hinton's science paper](http://www.cs.toronto.edu/~hinton/science.pdf). + +## Running instructions + +Running scripts are provided in *SINGA_ROOT/examples/rbm* folder. + +The MNIST dataset has 70,000 handwritten digit images. The +[data preparation](data.html) page +has details on converting this dataset into SINGA recognizable format. Users can +simply run the following commands to download and convert the dataset. + + # at SINGA_ROOT/examples/mnist/ + $ cp Makefile.example Makefile + $ make download + $ make create + +The training is separated into two phases, namely pre-training and fine-tuning. +The pre-training phase trains 4 RBMs in sequence, + + # at SINGA_ROOT/ + $ ./bin/singa-run.sh -conf examples/rbm/rbm1.conf + $ ./bin/singa-run.sh -conf examples/rbm/rbm2.conf + $ ./bin/singa-run.sh -conf examples/rbm/rbm3.conf + $ ./bin/singa-run.sh -conf examples/rbm/rbm4.conf + +The fine-tuning phase trains the auto-encoder by, + + $ ./bin/singa-run.sh -conf examples/rbm/autoencoder.conf + + +## Training details + +### RBM1 + +<img src="../images/example-rbm1.png" align="center" width="200px"/> +<span><strong>Figure 1 - RBM1.</strong></span> + +The neural net structure for training RBM1 is shown in Figure 1. +The data layer and parser layer provides features for training RBM1. +The visible layer (connected with parser layer) of RBM1 accepts the image feature +(784 dimension). The hidden layer is set to have 1000 neurons (units). +These two layers are configured as, + + layer{ + name: "RBMVis" + type: kRBMVis + srclayers:"mnist" + srclayers:"RBMHid" + rbm_conf{ + hdim: 1000 + } + param{ + name: "w1" + init{ + type: kGaussian + mean: 0.0 + std: 0.1 + } + } + param{ + name: "b11" + init{ + type: kConstant + value: 0.0 + } + } + } + + layer{ + name: "RBMHid" + type: kRBMHid + srclayers:"RBMVis" + rbm_conf{ + hdim: 1000 + } + param{ + name: "w1_" + share_from: "w1" + } + param{ + name: "b12" + init{ + type: kConstant + value: 0.0 + } + } + } + + + +For RBM, the weight matrix is shared by the visible and hidden layers. For instance, +`w1` is shared by `vis` and `hid` layers shown in Figure 1. In SINGA, we can configure +the `share_from` field to enable [parameter sharing](param.html) +as shown above for the param `w1` and `w1_`. + +[Contrastive Divergence](train-one-batch.html#contrastive-divergence) +is configured as the algorithm for [TrainOneBatch](train-one-batch.html). +Following Hinton's paper, we configure the [updating protocol](updater.html) +as follows, + + # Updater Configuration + updater{ + type: kSGD + momentum: 0.2 + weight_decay: 0.0002 + learning_rate{ + base_lr: 0.1 + type: kFixed + } + } + +Since the parameters of RBM0 will be used to initialize the auto-encoder, we should +configure the `workspace` field to specify a path for the checkpoint folder. +For example, if we configure it as, + + cluster { + workspace: "examples/rbm/rbm1/" + } + +Then SINGA will [checkpoint the parameters](checkpoint.html) into *examples/rbm/rbm1/*. + +### RBM1 +<img src="../images/example-rbm2.png" align="center" width="200px"/> +<span><strong>Figure 2 - RBM2.</strong></span> + +Figure 2 shows the net structure of training RBM2. +The visible units of RBM2 accept the output from the Sigmoid1 layer. The Inner1 layer +is a `InnerProductLayer` whose parameters are set to the `w1` and `b12` learned +from RBM1. +The neural net configuration is (with layers for data layer and parser layer omitted). + + layer{ + name: "Inner1" + type: kInnerProduct + srclayers:"mnist" + innerproduct_conf{ + num_output: 1000 + } + param{ name: "w1" } + param{ name: "b12"} + } + + layer{ + name: "Sigmoid1" + type: kSigmoid + srclayers:"Inner1" + } + + layer{ + name: "RBMVis" + type: kRBMVis + srclayers:"Sigmoid1" + srclayers:"RBMHid" + rbm_conf{ + hdim: 500 + } + param{ + name: "w2" + ... + } + param{ + name: "b21" + ... + } + } + + layer{ + name: "RBMHid" + type: kRBMHid + srclayers:"RBMVis" + rbm_conf{ + hdim: 500 + } + param{ + name: "w2_" + share_from: "w2" + } + param{ + name: "b22" + ... + } + } + +To load w0 and b02 from RBM0's checkpoint file, we configure the `checkpoint_path` as, + + checkpoint_path: "examples/rbm/rbm1/checkpoint/step6000-worker0" + cluster{ + workspace: "examples/rbm/rbm2" + } + +The workspace is changed for checkpointing `w2`, `b21` and `b22` into +*examples/rbm/rbm2/*. + +### RBM3 + +<img src="../images/example-rbm3.png" align="center" width="200px"/> +<span><strong>Figure 3 - RBM3.</strong></span> + +Figure 3 shows the net structure of training RBM3. In this model, a layer with +250 units is added as the hidden layer of RBM3. The visible units of RBM3 +accepts output from Sigmoid2 layer. Parameters of Inner1 and Innner2 are set to +`w1,b12,w2,b22` which can be load from the checkpoint file of RBM2, +i.e., "examples/rbm/rbm2/". + +### RBM4 + + +<img src="../images/example-rbm4.png" align="center" width="200px"/> +<span><strong>Figure 4 - RBM4.</strong></span> + +Figure 4 shows the net structure of training RBM4. It is similar to Figure 3, +but according to [Hinton's science paper](http://www.cs.toronto.edu/~hinton/science.pdf), the hidden units of the +top RBM (RBM4) have stochastic real-valued states drawn from a unit variance +Gaussian whose mean is determined by the input from the RBM's logistic visible +units. So we add a `gaussian` field in the RBMHid layer to control the +sampling distribution (Gaussian or Bernoulli). In addition, this +RBM has a much smaller learning rate (0.001). The neural net configuration for +the RBM4 and the updating protocol is (with layers for data layer and parser +layer omitted), + + # Updater Configuration + updater{ + type: kSGD + momentum: 0.9 + weight_decay: 0.0002 + learning_rate{ + base_lr: 0.001 + type: kFixed + } + } + + layer{ + name: "RBMVis" + type: kRBMVis + srclayers:"Sigmoid3" + srclayers:"RBMHid" + rbm_conf{ + hdim: 30 + } + param{ + name: "w4" + ... + } + param{ + name: "b41" + ... + } + } + + layer{ + name: "RBMHid" + type: kRBMHid + srclayers:"RBMVis" + rbm_conf{ + hdim: 30 + gaussian: true + } + param{ + name: "w4_" + share_from: "w4" + } + param{ + name: "b42" + ... + } + } + +### Auto-encoder +In the fine-tuning stage, the 4 RBMs are "unfolded" to form encoder and decoder +networks that are initialized using the parameters from the previous 4 RBMs. + +<img src="../images/example-autoencoder.png" align="center" width="500px"/> +<span><strong>Figure 5 - Auto-Encoders.</strong></span> + + +Figure 5 shows the neural net structure for training the auto-encoder. +[Back propagation (kBP)] (train-one-batch.html) is +configured as the algorithm for `TrainOneBatch`. We use the same cluster +configuration as RBM models. For updater, we use [AdaGrad](updater.html#adagradupdater) algorithm with +fixed learning rate. + + ### Updater Configuration + updater{ + type: kAdaGrad + learning_rate{ + base_lr: 0.01 + type: kFixed + } + } + + + +According to [Hinton's science paper](http://www.cs.toronto.edu/~hinton/science.pdf), +we configure a EuclideanLoss layer to compute the reconstruction error. The neural net +configuration is (with some of the middle layers omitted), + + layer{ name: "data" } + layer{ name:"mnist" } + layer{ + name: "Inner1" + param{ name: "w1" } + param{ name: "b12" } + } + layer{ name: "Sigmoid1" } + ... + layer{ + name: "Inner8" + innerproduct_conf{ + num_output: 784 + transpose: true + } + param{ + name: "w8" + share_from: "w1" + } + param{ name: "b11" } + } + layer{ name: "Sigmoid8" } + + # Euclidean Loss Layer Configuration + layer{ + name: "loss" + type:kEuclideanLoss + srclayers:"Sigmoid8" + srclayers:"mnist" + } + +To load pre-trained parameters from the 4 RBMs' checkpoint file we configure `checkpoint_path` as + + ### Checkpoint Configuration + checkpoint_path: "examples/rbm/checkpoint/rbm1/checkpoint/step6000-worker0" + checkpoint_path: "examples/rbm/checkpoint/rbm2/checkpoint/step6000-worker0" + checkpoint_path: "examples/rbm/checkpoint/rbm3/checkpoint/step6000-worker0" + checkpoint_path: "examples/rbm/checkpoint/rbm4/checkpoint/step6000-worker0" + + +## Visualization Results + +<div> +<img src="../images/rbm-weight.PNG" align="center" width="300px"/> + +<img src="../images/rbm-feature.PNG" align="center" width="300px"/> +<br/> +<span><strong>Figure 6 - Bottom RBM weight matrix.</strong></span> + + + + + +<span><strong>Figure 7 - Top layer features.</strong></span> +</div> + +Figure 6 visualizes sample columns of the weight matrix of RBM1, We can see the +Gabor-like filters are learned. Figure 7 depicts the features extracted from +the top-layer of the auto-encoder, wherein one point represents one image. +Different colors represent different digits. We can see that most images are +well clustered according to the ground truth.
