Added: incubator/singa/site/trunk/content/markdown/v0.2.0/kr/layer.md URL: http://svn.apache.org/viewvc/incubator/singa/site/trunk/content/markdown/v0.2.0/kr/layer.md?rev=1738695&view=auto ============================================================================== --- incubator/singa/site/trunk/content/markdown/v0.2.0/kr/layer.md (added) +++ incubator/singa/site/trunk/content/markdown/v0.2.0/kr/layer.md Tue Apr 12 06:22:20 2016 @@ -0,0 +1,614 @@ +# Layers + +--- + +Layer is a core abstraction in SINGA. It performs a variety of feature +transformations for extracting high-level features, e.g., loading raw features, +parsing RGB values, doing convolution transformation, etc. + +The *Basic user guide* section introduces the configuration of a built-in +layer. *Advanced user guide* explains how to extend the base Layer class to +implement users' functions. + +## Basic user guide + +### Layer configuration + +Configuration of two example layers are shown below, + + layer { + name: "data" + type: kCSVRecord + store_conf { } + } + layer{ + name: "fc1" + type: kInnerProduct + srclayers: "data" + innerproduct_conf{ } + param{ } + } + +There are some common fields for all kinds of layers: + + * `name`: a string used to differentiate two layers in a neural net. + * `type`: an integer used for identifying a specific Layer subclass. The types of built-in + layers are listed in LayerType (defined in job.proto). + For user-defined layer subclasses, `user_type` should be used instead of `type`. + * `srclayers`: names of the source layers. + In SINGA, all connections are [converted](neural-net.html) to directed connections. + * `param`: configuration for a [Param](param.html) instance. + There can be multiple Param objects in one layer. + +Different layers may have different configurations. These configurations +are defined in `<type>_conf`. E.g., "fc1" layer has +`innerproduct_conf`. The subsequent sections +explain the functionality of each built-in layer and how to configure it. + +### Built-in Layer subclasses +SINGA has provided many built-in layers, which can be used directly to create neural nets. +These layers are categorized according to their functionalities, + + * Input layers for loading records (e.g., images) from disk files, HDFS or network into memory. + * Neuron layers for feature transformation, e.g., [convolution](../api/classsinga_1_1ConvolutionLayer.html), [pooling](../api/classsinga_1_1PoolingLayer.html), dropout, etc. + * Loss layers for measuring the training objective loss, e.g., Cross Entropy loss or Euclidean loss. + * Output layers for outputting the prediction results (e.g., probabilities of each category) or features into persistent storage, e.g., disk or HDFS. + * Connection layers for connecting layers when the neural net is partitioned. + +#### Input layers + +Input layers load training/test data from disk or other places (e.g., HDFS or network) +into memory. + +##### StoreInputLayer + +[StoreInputLayer](../api/classsinga_1_1StoreInputLayer.html) is a base layer for +loading data from data store. The data store can be a KVFile or TextFile (LMDB, +LevelDB, HDFS, etc., will be supported later). Its `ComputeFeature` function reads +batchsize (string:key, string:value) tuples. Each tuple is parsed by a `Parse` function +implemented by its subclasses. + +The configuration for this layer is in `store_conf`, + + store_conf { + backend: # "kvfile" or "textfile" + path: # path to the data store + batchsize : + ... + } + +##### SingleLabelRecordLayer + +It is a subclass of StoreInputLayer. It assumes the (key, value) tuple loaded +from a data store contains a feature vector (and a label) for one data instance. +All feature vectors are of the same fixed length. The shape of one instance +is configured through the `shape` field, e.g., the following configuration +specifies the shape for the CIFAR10 images. + + store_conf { + shape: 3 #channels + shape: 32 #height + shape: 32 #width + } + +It may do some preprocessing like [standardization](http://ufldl.stanford.edu/wiki/index.php/Data_Preprocessing). +The data for preprocessing is loaded by and parsed in a virtual function, which is implemented by +its subclasses. + +##### RecordInputLayer + +It is a subclass of SingleLabelRecordLayer. It parses the value field from one +tuple into a RecordProto, which is generated by Google Protobuf according +to common.proto. It can be used to store features for images (e.g., using the pixel field) +or other objects (using the data field). The key field is not parsed. + + type: kRecordInput + store_conf { + has_label: # default is true + ... + } + +##### CSVInputLayer + +It is a subclass of SingleLabelRecordLayer. The value field from one tuple is parsed +as a CSV line (separated by comma). The first number would be parsed as a label if +`has_label` is configured in `store_conf`. Otherwise, all numbers would be parsed +into one row of the `data_` Blob. + + type: kCSVInput + store_conf { + has_label: # default is true + ... + } + +##### ImagePreprocessLayer + +This layer does image preprocessing, e.g., cropping, mirroring and scaling, against +the data Blob from its source layer. It deprecates the RGBImageLayer which +works on the Record from ShardDataLayer. It still uses the same configuration as +RGBImageLayer, + + type: kImagePreprocess + rgbimage_conf { + scale: float + cropsize: int # cropping each image to keep the central part with this size + mirror: bool # mirror the image by set image[i,j]=image[i,len-j] + meanfile: "Image_Mean_File_Path" + } + +##### ShardDataLayer (Deprected) +Deprected! Please use ProtoRecordInputLayer or CSVRecordInputLayer. + +[ShardDataLayer](../api/classsinga_1_1ShardDataLayer.html) is a subclass of DataLayer, +which reads Records from disk file. The file should be created using +[DataShard](../api/classsinga_1_1DataShard.html) +class. With the data file prepared, users configure the layer as + + type: kShardData + sharddata_conf { + path: "path to data shard folder" + batchsize: int + random_skip: int + } + +`batchsize` specifies the number of records to be trained for one mini-batch. +The first `rand() % random_skip` `Record`s will be skipped at the first +iteration. This is to enforce that different workers work on different Records. + +##### LMDBDataLayer (Deprected) +Deprected! Please use ProtoRecordInputLayer or CSVRecordInputLayer. + +[LMDBDataLayer] is similar to ShardDataLayer, except that the Records are +loaded from LMDB. + + type: kLMDBData + lmdbdata_conf { + path: "path to LMDB folder" + batchsize: int + random_skip: int + } + +##### ParserLayer (Deprected) +Deprected! Please use ProtoRecordInputLayer or CSVRecordInputLayer. + +It get a vector of Records from DataLayer and parse features into +a Blob. + + virtual void ParseRecords(Phase phase, const vector<Record>& records, Blob<float>* blob) = 0; + + +##### LabelLayer (Deprected) +Deprected! Please use ProtoRecordInputLayer or CSVRecordInputLayer. + +[LabelLayer](../api/classsinga_1_1LabelLayer.html) is a subclass of ParserLayer. +It parses a single label from each Record. Consequently, it +will put $b$ (mini-batch size) values into the Blob. It has no specific configuration fields. + + +##### MnistImageLayer (Deprected) +Deprected! Please use ProtoRecordInputLayer or CSVRecordInputLayer. +[MnistImageLayer] is a subclass of ParserLayer. It parses the pixel values of +each image from the MNIST dataset. The pixel +values may be normalized as `x/norm_a - norm_b`. For example, if `norm_a` is +set to 255 and `norm_b` is set to 0, then every pixel will be normalized into +[0, 1]. + + type: kMnistImage + mnistimage_conf { + norm_a: float + norm_b: float + } + +##### RGBImageLayer (Deprected) +Deprected! Please use the ImagePreprocessLayer. +[RGBImageLayer](../api/classsinga_1_1RGBImageLayer.html) is a subclass of ParserLayer. +It parses the RGB values of one image from each Record. It may also +apply some transformations, e.g., cropping, mirroring operations. If the +`meanfile` is specified, it should point to a path that contains one Record for +the mean of each pixel over all training images. + + type: kRGBImage + rgbimage_conf { + scale: float + cropsize: int # cropping each image to keep the central part with this size + mirror: bool # mirror the image by set image[i,j]=image[i,len-j] + meanfile: "Image_Mean_File_Path" + } + +##### PrefetchLayer + +[PrefetchLayer](../api/classsinga_1_1PrefetchLayer.html) embeds other input layers +to do data prefeching. It will launch a thread to call the embedded layers to load and extract features. +It ensures that the I/O task and computation task can work simultaneously. +One example PrefetchLayer configuration is, + + layer { + name: "prefetch" + type: kPrefetch + sublayers { + name: "data" + type: kShardData + sharddata_conf { } + } + sublayers { + name: "rgb" + type: kRGBImage + srclayers:"data" + rgbimage_conf { } + } + sublayers { + name: "label" + type: kLabel + srclayers: "data" + } + exclude:kTest + } + +The layers on top of the PrefetchLayer should use the name of the embedded +layers as their source layers. For example, the "rgb" and "label" should be +configured to the `srclayers` of other layers. + + +#### Output Layers + +Output layers get data from their source layers and write them to persistent storage, +e.g., disk files or HDFS (to be supported). + +##### RecordOutputLayer + +This layer gets data (and label if it is available) from its source layer and converts it into records of type +RecordProto. Records are written as (key = instance No., value = serialized record) tuples into Store, e.g., KVFile. The configuration of this layer +should include the specifics of the Store backend via `store_conf`. + + layer { + name: "output" + type: kRecordOutput + srclayers: + store_conf { + backend: "kvfile" + path: + } + } + +##### CSVOutputLayer +This layer gets data (and label if it available) from its source layer and converts it into +a string per instance with fields separated by commas (i.e., CSV format). The shape information +is not kept in the string. All strings are written into +Store, e.g., text file. The configuration of this layer should include the specifics of the Store backend via `store_conf`. + + layer { + name: "output" + type: kCSVOutput + srclayers: + store_conf { + backend: "textfile" + path: + } + } + +#### Neuron Layers + +Neuron layers conduct feature transformations. + +##### ConvolutionLayer + +[ConvolutionLayer](../api/classsinga_1_1ConvolutionLayer.html) conducts convolution transformation. + + type: kConvolution + convolution_conf { + num_filters: int + kernel: int + stride: int + pad: int + } + param { } # weight/filter matrix + param { } # bias vector + +The int value `num_filters` stands for the count of the applied filters; the int +value `kernel` stands for the convolution kernel size (equal width and height); +the int value `stride` stands for the distance between the successive filters; +the int value `pad` pads each with a given int number of pixels border of +zeros. + +##### InnerProductLayer + +[InnerProductLayer](../api/classsinga_1_1InnerProductLayer.html) is fully connected with its (single) source layer. +Typically, it has two parameter fields, one for weight matrix, and the other +for bias vector. It rotates the feature of the source layer (by multiplying with weight matrix) and +shifts it (by adding the bias vector). + + type: kInnerProduct + innerproduct_conf { + num_output: int + } + param { } # weight matrix + param { } # bias vector + + +##### PoolingLayer + +[PoolingLayer](../api/classsinga_1_1PoolingLayer.html) is used to do a normalization (or averaging or sampling) of the +feature vectors from the source layer. + + type: kPooling + pooling_conf { + pool: AVE|MAX // Choose whether use the Average Pooling or Max Pooling + kernel: int // size of the kernel filter + pad: int // the padding size + stride: int // the step length of the filter + } + +The pooling layer has two methods: Average Pooling and Max Pooling. +Use the enum AVE and MAX to choose the method. + + * Max Pooling selects the max value for each filtering area as a point of the + result feature blob. + * Average Pooling averages all values for each filtering area at a point of the + result feature blob. + +##### ReLULayer + +[ReLuLayer](../api/classsinga_1_1ReLULayer.html) has rectified linear neurons, which conducts the following +transformation, `f(x) = Max(0, x)`. It has no specific configuration fields. + +##### STanhLayer + +[STanhLayer](../api/classsinga_1_1TanhLayer.html) uses the scaled tanh as activation function, i.e., `f(x)=1.7159047* tanh(0.6666667 * x)`. +It has no specific configuration fields. + +##### SigmoidLayer + +[SigmoidLayer] uses the sigmoid (or logistic) as activation function, i.e., +`f(x)=sigmoid(x)`. It has no specific configuration fields. + + +##### Dropout Layer +[DropoutLayer](../api/asssinga_1_1DropoutLayer.html) is a layer that randomly dropouts some inputs. +This scheme helps deep learning model away from over-fitting. + + type: kDropout + dropout_conf { + dropout_ratio: float # dropout probability + } + +##### LRNLayer +[LRNLayer](../api/classsinga_1_1LRNLayer.html), (Local Response Normalization), normalizes over the channels. + + type: kLRN + lrn_conf { + local_size: int + alpha: float // scaling parameter + beta: float // exponential number + } + +`local_size` specifies the quantity of the adjoining channels which will be summed up. + For `WITHIN_CHANNEL`, it means the side length of the space region which will be summed up. + + +#### Loss Layers + +Loss layers measures the objective training loss. + +##### SoftmaxLossLayer + +[SoftmaxLossLayer](../api/classsinga_1_1SoftmaxLossLayer.html) is a combination of the Softmax transformation and +Cross-Entropy loss. It applies Softmax firstly to get a prediction probability +for each output unit (neuron) and compute the cross-entropy against the ground truth. +It is generally used as the final layer to generate labels for classification tasks. + + type: kSoftmaxLoss + softmaxloss_conf { + topk: int + } + +The configuration field `topk` is for selecting the labels with `topk` +probabilities as the prediction results. It is tedious for users to view the +prediction probability of every label. + +#### ConnectionLayer + +Subclasses of ConnectionLayer are utility layers that connects other layers due +to neural net partitioning or other cases. + +##### ConcateLayer + +[ConcateLayer](../api/classsinga_1_1ConcateLayer.html) connects more than one source layers to concatenate their feature +blob along given dimension. + + type: kConcate + concate_conf { + concate_dim: int // define the dimension + } + +##### SliceLayer + +[SliceLayer](../api/classsinga_1_1SliceLayer.html) connects to more than one destination layers to slice its feature +blob along given dimension. + + type: kSlice + slice_conf { + slice_dim: int + } + +##### SplitLayer + +[SplitLayer](../api/classsinga_1_1SplitLayer.html) connects to more than one destination layers to replicate its +feature blob. + + type: kSplit + split_conf { + num_splits: int + } + +##### BridgeSrcLayer & BridgeDstLayer + +[BridgeSrcLayer](../api/classsinga_1_1BridgeSrcLayer.html) & +[BridgeDstLayer](../api/classsinga_1_1BridgeDstLayer.html) are utility layers assisting data (e.g., feature or +gradient) transferring due to neural net partitioning. These two layers are +added implicitly. Users typically do not need to configure them in their neural +net configuration. + +### OutputLayer + +It write the prediction results or the extracted features into file, HTTP stream +or other places. Currently SINGA has not implemented any specific output layer. + +## Advanced user guide + +The base Layer class is introduced in this section, followed by how to +implement a new Layer subclass. + +### Base Layer class + +#### Members + + LayerProto layer_conf_; + Blob<float> data_, grad_; + vector<AuxType> aux_data_; + +The base layer class keeps the user configuration in `layer_conf_`. +Almost all layers has $b$ (mini-batch size) feature vectors, which are stored +in the `data_` [Blob](../api/classsinga_1_1Blob.html) (A Blob is a chunk of memory space, proposed in +[Caffe](http://caffe.berkeleyvision.org/)). +There are layers without feature vectors; instead, they share the data from +source layers. +The `grad_` Blob is for storing the gradients of the +objective loss w.r.t. the `data_` Blob. It is necessary in [BP algorithm](../api/classsinga_1_1BPWorker.html), +hence we put it as a member of the base class. For [CD algorithm](../api/classsinga_1_1CDWorker.html), the `grad_` +field is not used; instead, the layers for the RBM model may have a Blob for the positive +phase feature and a Blob for the negative phase feature. For a recurrent layer +in RNN, one row of the feature blob corresponds to the feature of one internal layer. +The `aux_data_` stores the auxiliary data, e.g., image label (set `AuxType` to int). +If images have variant number of labels, the AuxType can be defined to `vector<int>`. +Currently, we hard code `AuxType` to int. It will be added as a template argument of Layer class later. + +If a layer has parameters, these parameters are declared using type +[Param](param.html). Since some layers do not have +parameters, we do not declare any `Param` in the base layer class. + +#### Functions + + virtual void Setup(const LayerProto& conf, const vector<Layer*>& srclayers); + virtual void ComputeFeature(int flag, const vector<Layer*>& srclayers) = 0; + virtual void ComputeGradient(int flag, const vector<Layer*>& srclayers) = 0; + +The `Setup` function reads user configuration, i.e. `conf`, and information +from source layers, e.g., mini-batch size, to set the +shape of the `data_` (and `grad_`) field as well +as some other layer specific fields. +<!--- +If `npartitions` is larger than 1, then +users need to reduce the sizes of `data_`, `grad_` Blobs or Param objects. For +example, if the `partition_dim=0` and there is no source layer, e.g., this +layer is a (bottom) data layer, then its `data_` and `grad_` Blob should have +`b/npartitions` feature vectors; If the source layer is also partitioned on +dimension 0, then this layer should have the same number of feature vectors as +the source layer. More complex partition cases are discussed in +[Neural net partitioning](neural-net.html#neural-net-partitioning). Typically, the +Setup function just set the shapes of `data_` Blobs and Param objects. +--> +Memory will not be allocated until computation over the data structure happens. + +The `ComputeFeature` function evaluates the feature blob by transforming (e.g. +convolution and pooling) features from the source layers. `ComputeGradient` +computes the gradients of parameters associated with this layer. These two +functions are invoked by the [TrainOneBatch](train-one-batch.html) +function during training. Hence, they should be consistent with the +`TrainOneBatch` function. Particularly, for feed-forward and RNN models, they are +trained using [BP algorithm](train-one-batch.html#back-propagation), +which requires each layer's `ComputeFeature` +function to compute `data_` based on source layers, and requires each layer's +`ComputeGradient` to compute gradients of parameters and source layers' +`grad_`. For energy models, e.g., RBM, they are trained by +[CD algorithm](train-one-batch.html#contrastive-divergence), which +requires each layer's `ComputeFeature` function to compute the feature vectors +for the positive phase or negative phase depending on the `phase` argument, and +requires the `ComputeGradient` function to only compute parameter gradients. +For some layers, e.g., loss layer or output layer, they can put the loss or +prediction result into the `metric` argument, which will be averaged and +displayed periodically. + +### Implementing a new Layer subclass + +Users can extend the Layer class or other subclasses to implement their own feature transformation +logics as long as the two virtual functions are overridden to be consistent with +the `TrainOneBatch` function. The `Setup` function may also be overridden to +read specific layer configuration. + +The [RNNLM](rnn.html) provides a couple of user-defined layers. You can refer to them as examples. + +#### Layer specific protocol message + +To implement a new layer, the first step is to define the layer specific +configuration. Suppose the new layer is `FooLayer`, the layer specific +google protocol message `FooLayerProto` should be defined as + + # in user.proto + package singa + import "job.proto" + message FooLayerProto { + optional int32 a = 1; // specific fields to the FooLayer + } + +In addition, users need to extend the original `LayerProto` (defined in job.proto of SINGA) +to include the `foo_conf` as follows. + + extend LayerProto { + optional FooLayerProto foo_conf = 101; // unique field id, reserved for extensions + } + +If there are multiple new layers, then each layer that has specific +configurations would have a `<type>_conf` field and takes one unique extension number. +SINGA has reserved enough extension numbers, e.g., starting from 101 to 1000. + + # job.proto of SINGA + LayerProto { + ... + extensions 101 to 1000; + } + +With user.proto defined, users can use +[protoc](https://developers.google.com/protocol-buffers/) to generate the `user.pb.cc` +and `user.pb.h` files. In users' code, the extension fields can be accessed via, + + auto conf = layer_proto_.GetExtension(foo_conf); + int a = conf.a(); + +When defining configurations of the new layer (in job.conf), users should use +`user_type` for its layer type instead of `type`. In addition, `foo_conf` +should be enclosed in brackets. + + layer { + name: "foo" + user_type: "kFooLayer" # Note user_type of user-defined layers is string + [foo_conf] { # Note there is a pair of [] for extension fields + a: 10 + } + } + +#### New Layer subclass declaration + +The new layer subclass can be implemented like the built-in layer subclasses. + + class FooLayer : public singa::Layer { + public: + void Setup(const LayerProto& conf, const vector<Layer*>& srclayers) override; + void ComputeFeature(int flag, const vector<Layer*>& srclayers) override; + void ComputeGradient(int flag, const vector<Layer*>& srclayers) override; + + private: + // members + }; + +Users must override the two virtual functions to be called by the +`TrainOneBatch` for either BP or CD algorithm. Typically, the `Setup` function +will also be overridden to initialize some members. The user configured fields +can be accessed through `layer_conf_` as shown in the above paragraphs. + +#### New Layer subclass registration + +The newly defined layer should be registered in [main.cc](http://singa.incubator.apache.org/docs/programming-guide) by adding + + driver.RegisterLayer<FooLayer, std::string>("kFooLayer"); // "kFooLayer" should be matched to layer configurations in job.conf. + +After that, the [NeuralNet](neural-net.html) can create instances of the new Layer subclass.
Added: incubator/singa/site/trunk/content/markdown/v0.2.0/kr/mesos.md URL: http://svn.apache.org/viewvc/incubator/singa/site/trunk/content/markdown/v0.2.0/kr/mesos.md?rev=1738695&view=auto ============================================================================== --- incubator/singa/site/trunk/content/markdown/v0.2.0/kr/mesos.md (added) +++ incubator/singa/site/trunk/content/markdown/v0.2.0/kr/mesos.md Tue Apr 12 06:22:20 2016 @@ -0,0 +1,84 @@ +#Distributed Training on Mesos + +This guide explains how to start SINGA distributed training on a Mesos cluster. It assumes that both Mesos and HDFS are already running, and every node has SINGA installed. +We assume the architecture depicted below, in which a cluster nodes are Docker container. Refer to [Docker guide](docker.html) for details of how to start individual nodes and set up network connection between them (make sure [weave](http://weave.works/guides/weave-docker-ubuntu-simple.html) is running at each node, and the cluster's headnode is running in container `node0`) + + + +--- + +## Start HDFS and Mesos +Go inside each container, using: +```` +docker exec -it nodeX /bin/bash +```` +and configure it as follows: + +* On container `node0` + + hadoop namenode -format + hadoop-daemon.sh start namenode + /opt/mesos-0.22.0/build/bin/mesos-master.sh --work_dir=/opt --log_dir=/opt --quiet > /dev/null & + zk-service.sh start + +* On container `node1, node2, ...` + + hadoop-daemon.sh start datanode + /opt/mesos-0.22.0/build/bin/mesos-slave.sh --master=node0:5050 --log_dir=/opt --quiet > /dev/null & + +To check if the setup has been successful, check that HDFS namenode has registered `N` datanodes, via: + +```` +hadoop dfsadmin -report +```` + +#### Mesos logs +Mesos logs are stored at `/opt/lt-mesos-master.INFO` on `node0` and `/opt/lt-mesos-slave.INFO` at other nodes. + +--- + +## Starting SINGA training on Mesos +Assumed that Mesos and HDFS are already started, SINGA job can be launched at **any** container. + +#### Launching job + +1. Log in to any container, then + cd incubator-singa/tool/mesos +<a name="job_start"></a> +2. Check that configuration files are correct: + + `scheduler.conf` contains information about the master nodes + + `singa.conf` contains information about Zookeeper node0 + + Job configuration file `job.conf` **contains full path to the examples directories (NO RELATIVE PATH!).** +3. Start the job: + + If starting for the first time: + + ./scheduler <job config file> -scheduler_conf <scheduler config file> -singa_conf <SINGA config file> + + If not the first time: + + ./scheduler <job config file> + +**Notes.** Each running job is given a `frameworkID`. Look for the log message of the form: + + Framework registered with XXX-XXX-XXX-XXX-XXX-XXX + +#### Monitoring and Debugging + +Each Mesos job is given a `frameworkID` and a *sandbox* directory is created for each job. +The directory is in the specified `work_dir` (or `/tmp/mesos`) by default. For example, the error +during SINGA execution can be found at: + + /tmp/mesos/slaves/xxxxx-Sx/frameworks/xxxxx/executors/SINGA_x/runs/latest/stderr + +Other artifacts, like files downloaded from HDFS (`job.conf`) and `stdout` can be found in the same +directory. + +#### Stopping + +There are two way to kill the running job: + +1. If the scheduler is running in the foreground, simply kill it (using `Ctrl-C`, for example). + +2. If the scheduler is running in the background, kill it using Mesos's REST API: + + curl -d "frameworkId=XXX-XXX-XXX-XXX-XXX-XXX" -X POST http://<master>/master/shutdown + Added: incubator/singa/site/trunk/content/markdown/v0.2.0/kr/mlp.md URL: http://svn.apache.org/viewvc/incubator/singa/site/trunk/content/markdown/v0.2.0/kr/mlp.md?rev=1738695&view=auto ============================================================================== --- incubator/singa/site/trunk/content/markdown/v0.2.0/kr/mlp.md (added) +++ incubator/singa/site/trunk/content/markdown/v0.2.0/kr/mlp.md Tue Apr 12 06:22:20 2016 @@ -0,0 +1,195 @@ +# MLP Example + +--- + +Multilayer perceptron (MLP) is a subclass of feed-forward neural networks. +A MLP typically consists of multiple directly connected layers, with each layer fully +connected to the next one. In this example, we will use SINGA to train a +[simple MLP model proposed by Ciresan](http://arxiv.org/abs/1003.0358) +for classifying handwritten digits from the [MNIST dataset](http://yann.lecun.com/exdb/mnist/). + +## Running instructions + +Please refer to the [installation](installation.html) page for +instructions on building SINGA, and the [quick start](quick-start.html) +for instructions on starting zookeeper. + +We have provided scripts for preparing the training and test dataset in *examples/cifar10/*. + + # in examples/mnist + $ cp Makefile.example Makefile + $ make download + $ make create + +After the datasets are prepared, we start the training by + + ./bin/singa-run.sh -conf examples/mnist/job.conf + +After it is started, you should see output like + + Record job information to /tmp/singa-log/job-info/job-1-20150817-055231 + Executing : ./singa -conf /xxx/incubator-singa/examples/mnist/job.conf -singa_conf /xxx/incubator-singa/conf/singa.conf -singa_job 1 + E0817 07:15:09.211885 34073 cluster.cc:51] proc #0 -> 192.168.5.128:49152 (pid = 34073) + E0817 07:15:14.972231 34114 server.cc:36] Server (group = 0, id = 0) start + E0817 07:15:14.972520 34115 worker.cc:134] Worker (group = 0, id = 0) start + E0817 07:15:24.462602 34073 trainer.cc:373] Test step-0, loss : 2.341021, accuracy : 0.109100 + E0817 07:15:47.341076 34073 trainer.cc:373] Train step-0, loss : 2.357269, accuracy : 0.099000 + E0817 07:16:07.173364 34073 trainer.cc:373] Train step-10, loss : 2.222740, accuracy : 0.201800 + E0817 07:16:26.714855 34073 trainer.cc:373] Train step-20, loss : 2.091030, accuracy : 0.327200 + E0817 07:16:46.590946 34073 trainer.cc:373] Train step-30, loss : 1.969412, accuracy : 0.442100 + E0817 07:17:06.207080 34073 trainer.cc:373] Train step-40, loss : 1.865466, accuracy : 0.514800 + E0817 07:17:25.890033 34073 trainer.cc:373] Train step-50, loss : 1.773849, accuracy : 0.569100 + E0817 07:17:51.208935 34073 trainer.cc:373] Test step-60, loss : 1.613709, accuracy : 0.662100 + E0817 07:17:53.176766 34073 trainer.cc:373] Train step-60, loss : 1.659150, accuracy : 0.652600 + E0817 07:18:12.783370 34073 trainer.cc:373] Train step-70, loss : 1.574024, accuracy : 0.666000 + E0817 07:18:32.904942 34073 trainer.cc:373] Train step-80, loss : 1.529380, accuracy : 0.670500 + E0817 07:18:52.608111 34073 trainer.cc:373] Train step-90, loss : 1.443911, accuracy : 0.703500 + E0817 07:19:12.168465 34073 trainer.cc:373] Train step-100, loss : 1.387759, accuracy : 0.721000 + E0817 07:19:31.855865 34073 trainer.cc:373] Train step-110, loss : 1.335246, accuracy : 0.736500 + E0817 07:19:57.327133 34073 trainer.cc:373] Test step-120, loss : 1.216652, accuracy : 0.769900 + +After the training of some steps (depends on the setting) or the job is +finished, SINGA will [checkpoint](checkpoint.html) the model parameters. + +## Details + +To train a model in SINGA, you need to prepare the datasets, +and a job configuration which specifies the neural net structure, training +algorithm (BP or CD), SGD update algorithm (e.g. Adagrad), +number of training/test steps, etc. + +### Data preparation + +Before using SINGA, you need to write a program to pre-process the dataset you +use to a format that SINGA can read. Please refer to the +[Data Preparation](data.html) to get details about preparing +this MNIST dataset. + + +### Neural net + +<div style = "text-align: center"> +<img src = "../images/example-mlp.png" style = "width: 230px"> +<br/><strong>Figure 1 - Net structure of the MLP example. </strong></img> +</div> + + +Figure 1 shows the structure of the simple MLP model, which is constructed following +[Ciresan's paper](http://arxiv.org/abs/1003.0358). The dashed circle contains +two layers which represent one feature transformation stage. There are 6 such +stages in total. They sizes of the [InnerProductLayer](layer.html#innerproductlayer)s in these circles decrease from +2500->2000->1500->1000->500->10. + +Next we follow the guide in [neural net page](neural-net.html) +and [layer page](layer.html) to write the neural net configuration. + +* We configure an input layer to read the training/testing records from a disk file. + + layer { + name: "data" + type: kRecordInput + store_conf { + backend: "kvfile" + path: "examples/mnist/train_data.bin" + random_skip: 5000 + batchsize: 64 + shape: 784 + std_value: 127.5 + mean_value: 127.5 + } + exclude: kTest + } + + layer { + name: "data" + type: kRecordInput + store_conf { + backend: "kvfile" + path: "examples/mnist/test_data.bin" + batchsize: 100 + shape: 784 + std_value: 127.5 + mean_value: 127.5 + } + exclude: kTrain + } + + +* All [InnerProductLayer](layer.html#innerproductlayer)s are configured similarly as, + + layer{ + name: "fc1" + type: kInnerProduct + srclayers:"data" + innerproduct_conf{ + num_output: 2500 + } + param{ + name: "w1" + ... + } + param{ + name: "b1" + .. + } + } + + with the `num_output` decreasing from 2500 to 10. + +* A [STanhLayer](layer.html#stanhlayer) is connected to every InnerProductLayer +except the last one. It transforms the feature via scaled tanh function. + + layer{ + name: "tanh1" + type: kSTanh + srclayers:"fc1" + } + +* The final [Softmax loss layer](layer.html#softmaxloss) connects +to LabelLayer and the last STanhLayer. + + layer{ + name: "loss" + type:kSoftmaxLoss + softmaxloss_conf{ topk:1 } + srclayers:"fc6" + srclayers:"data" + } + +### Updater + +The [normal SGD updater](updater.html#updater) is selected. +The learning rate shrinks by 0.997 every 60 steps (i.e., one epoch). + + updater{ + type: kSGD + learning_rate{ + base_lr: 0.001 + type : kStep + step_conf{ + change_freq: 60 + gamma: 0.997 + } + } + } + +### TrainOneBatch algorithm + +The MLP model is a feed-forward model, hence +[Back-propagation algorithm](train-one-batch#back-propagation) +is selected. + + train_one_batch { + alg: kBP + } + +### Cluster setting + +The following configuration set a single worker and server for training. +[Training frameworks](frameworks.html) page introduces configurations of a couple of distributed +training frameworks. + + cluster { + nworker_groups: 1 + nserver_groups: 1 + } Added: incubator/singa/site/trunk/content/markdown/v0.2.0/kr/model-config.md URL: http://svn.apache.org/viewvc/incubator/singa/site/trunk/content/markdown/v0.2.0/kr/model-config.md?rev=1738695&view=auto ============================================================================== --- incubator/singa/site/trunk/content/markdown/v0.2.0/kr/model-config.md (added) +++ incubator/singa/site/trunk/content/markdown/v0.2.0/kr/model-config.md Tue Apr 12 06:22:20 2016 @@ -0,0 +1,294 @@ +# Model Configuration + +--- + +SINGA uses the stochastic gradient descent (SGD) algorithm to train parameters +of deep learning models. For each SGD iteration, there is a +[Worker](architecture.html) computing +gradients of parameters from the NeuralNet and a [Updater]() updating parameter +values based on gradients. Hence the model configuration mainly consists these +three parts. We will introduce the NeuralNet, Worker and Updater in the +following paragraphs and describe the configurations for them. All model +configuration is specified in the model.conf file in the user provided +workspace folder. E.g., the [cifar10 example folder](https://github.com/apache/incubator-singa/tree/master/examples/cifar10) +has a model.conf file. + + +## NeuralNet + +### Uniform model (neuralnet) representation + +<img src = "../images/model-categorization.png" style = "width: 400px"> Fig. 1: +Deep learning model categorization</img> + +Many deep learning models have being proposed. Fig. 1 is a categorization of +popular deep learning models based on the layer connections. The +[NeuralNet](https://github.com/apache/incubator-singa/blob/master/include/neuralnet/neuralnet.h) +abstraction of SINGA consists of multiple directly connected layers. This +abstraction is able to represent models from all the three categorizations. + + * For the feed-forward models, their connections are already directed. + + * For the RNN models, we unroll them into directed connections, as shown in + Fig. 2. + + * For the undirected connections in RBM, DBM, etc., we replace each undirected + connection with two directed connection, as shown in Fig. 3. + +<div style = "height: 200px"> +<div style = "float:left; text-align: center"> +<img src = "../images/unroll-rbm.png" style = "width: 280px"> <br/>Fig. 2: Unroll RBM </img> +</div> +<div style = "float:left; text-align: center; margin-left: 40px"> +<img src = "../images/unroll-rnn.png" style = "width: 550px"> <br/>Fig. 3: Unroll RNN </img> +</div> +</div> + +In specific, the NeuralNet class is defined in +[neuralnet.h](https://github.com/apache/incubator-singa/blob/master/include/neuralnet/neuralnet.h) : + + ... + vector<Layer*> layers_; + ... + +The Layer class is defined in +[base_layer.h](https://github.com/apache/incubator-singa/blob/master/include/neuralnet/base_layer.h): + + vector<Layer*> srclayers_, dstlayers_; + LayerProto layer_proto_; // layer configuration, including meta info, e.g., name + ... + + +The connection with other layers are kept in the `srclayers_` and `dstlayers_`. +Since there are many different feature transformations, there are many +different Layer implementations correspondingly. For layers that have +parameters in their feature transformation functions, they would have Param +instances in the layer class, e.g., + + Param weight; + + +### Configure the structure of a NeuralNet instance + +To train a deep learning model, the first step is to write the configurations +for the model structure, i.e., the layers and connections for the NeuralNet. +Like [Caffe](http://caffe.berkeleyvision.org/), we use the [Google Protocol +Buffer](https://developers.google.com/protocol-buffers/) to define the +configuration protocol. The +[NetProto](https://github.com/apache/incubator-singa/blob/master/src/proto/model.proto) +specifies the configuration fields for a NeuralNet instance, + +message NetProto { + repeated LayerProto layer = 1; + ... +} + +The configuration is then + + layer { + // layer configuration + } + layer { + // layer configuration + } + ... + +To configure the model structure, we just configure each layer involved in the model. + + message LayerProto { + // the layer name used for identification + required string name = 1; + // source layer names + repeated string srclayers = 3; + // parameters, e.g., weight matrix or bias vector + repeated ParamProto param = 12; + // the layer type from the enum above + required LayerType type = 20; + // configuration for convolution layer + optional ConvolutionProto convolution_conf = 30; + // configuration for concatenation layer + optional ConcateProto concate_conf = 31; + // configuration for dropout layer + optional DropoutProto dropout_conf = 33; + ... + } + +A sample configuration for a feed-forward model is like + + layer { + name : "input" + type : kRecordInput + } + layer { + name : "conv" + type : kInnerProduct + srclayers : "input" + param { + // configuration for parameter + } + innerproduct_conf { + // configuration for this specific layer + } + ... + } + +The layer type list is defined in +[LayerType](https://github.com/apache/incubator-singa/blob/master/src/proto/model.proto). +One type (kFoo) corresponds to one child class of Layer (FooLayer) and one +configuration field (foo_conf). All built-in layers are introduced in the [layer page](layer.html). + +## Worker + +At the beginning, the Work will initialize the values of Param instances of +each layer either randomly (according to user configured distribution) or +loading from a [checkpoint file](). For each training iteration, the worker +visits layers of the neural network to compute gradients of Param instances of +each layer. Corresponding to the three categories of models, there are three +different algorithm to compute the gradients of a neural network. + + 1. Back-propagation (BP) for feed-forward models + 2. Back-propagation through time (BPTT) for recurrent neural networks + 3. Contrastive divergence (CD) for RBM, DBM, etc models. + +SINGA has provided these three algorithms as three Worker implementations. +Users only need to configure in the model.conf file to specify which algorithm +should be used. The configuration protocol is + + message ModelProto { + ... + enum GradCalcAlg { + // BP algorithm for feed-forward models, e.g., CNN, MLP, RNN + kBP = 1; + // BPTT for recurrent neural networks + kBPTT = 2; + // CD algorithm for RBM, DBM etc., models + kCd = 3; + } + // gradient calculation algorithm + required GradCalcAlg alg = 8 [default = kBackPropagation]; + ... + } + +These algorithms override the TrainOneBatch function of the Worker. E.g., the +BPWorker implements it as + + void BPWorker::TrainOneBatch(int step, Metric* perf) { + Forward(step, kTrain, train_net_, perf); + Backward(step, train_net_); + } + +The Forward function passes the raw input features of one mini-batch through +all layers, and the Backward function visits the layers in reverse order to +compute the gradients of the loss w.r.t each layer's feature and each layer's +Param objects. Different algorithms would visit the layers in different orders. +Some may traverses the neural network multiple times, e.g., the CDWorker's +TrainOneBatch function is: + + void CDWorker::TrainOneBatch(int step, Metric* perf) { + PostivePhase(step, kTrain, train_net_, perf); + NegativePhase(step, kTran, train_net_, perf); + GradientPhase(step, train_net_); + } + +Each `*Phase` function would visit all layers one or multiple times. +All algorithms will finally call two functions of the Layer class: + + /** + * Transform features from connected layers into features of this layer. + * + * @param phase kTrain, kTest, kPositive, etc. + */ + virtual void ComputeFeature(Phase phase, Metric* perf) = 0; + /** + * Compute gradients for parameters (and connected layers). + * + * @param phase kTrain, kTest, kPositive, etc. + */ + virtual void ComputeGradient(Phase phase) = 0; + +All [Layer implementations]() must implement the above two functions. + + +## Updater + +Once the gradients of parameters are computed, the Updater will update +parameter values. There are many SGD variants for updating parameters, like +[AdaDelta](http://arxiv.org/pdf/1212.5701v1.pdf), +[AdaGrad](http://www.magicbroom.info/Papers/DuchiHaSi10.pdf), +[RMSProp](http://www.cs.toronto.edu/~tijmen/csc321/slides/lecture_slides_lec6.pdf), +[Nesterov](http://scholar.google.com/citations?view_op=view_citation&hl=en&user=DJ8Ep8YAAAAJ&citation_for_view=DJ8Ep8YAAAAJ:hkOj_22Ku90C) +and SGD with momentum. The core functions of the Updater is + + /** + * Update parameter values based on gradients + * @param step training step + * @param param pointer to the Param object + * @param grad_scale scaling factor for the gradients + */ + void Update(int step, Param* param, float grad_scale=1.0f); + /** + * @param step training step + * @return the learning rate for this step + */ + float GetLearningRate(int step); + +SINGA provides several built-in updaters and learning rate change methods. +Users can configure them according to the UpdaterProto + + message UpdaterProto { + enum UpdaterType{ + // noraml SGD with momentum and weight decay + kSGD = 1; + // adaptive subgradient, http://www.magicbroom.info/Papers/DuchiHaSi10.pdf + kAdaGrad = 2; + // http://www.cs.toronto.edu/~tijmen/csc321/slides/lecture_slides_lec6.pdf + kRMSProp = 3; + // Nesterov first optimal gradient method + kNesterov = 4; + } + // updater type + required UpdaterType type = 1 [default=kSGD]; + // configuration for RMSProp algorithm + optional RMSPropProto rmsprop_conf = 50; + + enum ChangeMethod { + kFixed = 0; + kInverseT = 1; + kInverse = 2; + kExponential = 3; + kLinear = 4; + kStep = 5; + kFixedStep = 6; + } + // change method for learning rate + required ChangeMethod lr_change= 2 [default = kFixed]; + + optional FixedStepProto fixedstep_conf=40; + ... + optional float momentum = 31 [default = 0]; + optional float weight_decay = 32 [default = 0]; + // base learning rate + optional float base_lr = 34 [default = 0]; + } + + +## Other model configuration fields + +Some other important configuration fields for training a deep learning model is +listed: + + // model name, e.g., "cifar10-dcnn", "mnist-mlp" + string name; + // displaying training info for every this number of iterations, default is 0 + int32 display_freq; + // total num of steps/iterations for training + int32 train_steps; + // do test for every this number of training iterations, default is 0 + int32 test_freq; + // run test for this number of steps/iterations, default is 0. + // The test dataset has test_steps * batchsize instances. + int32 test_steps; + // do checkpoint for every this number of training steps, default is 0 + int32 checkpoint_freq; + +The pages of [checkpoint and restore](checkpoint.html) has details on checkpoint related fields. Added: incubator/singa/site/trunk/content/markdown/v0.2.0/kr/neural-net.md URL: http://svn.apache.org/viewvc/incubator/singa/site/trunk/content/markdown/v0.2.0/kr/neural-net.md?rev=1738695&view=auto ============================================================================== --- incubator/singa/site/trunk/content/markdown/v0.2.0/kr/neural-net.md (added) +++ incubator/singa/site/trunk/content/markdown/v0.2.0/kr/neural-net.md Tue Apr 12 06:22:20 2016 @@ -0,0 +1,326 @@ +# Neural Net + +--- + +`NeuralNet` in SINGA represents an instance of user's neural net model. As the +neural net typically consists of a set of layers, `NeuralNet` comprises +a set of unidirectionally connected [Layer](layer.html)s. +This page describes how to convert an user's neural net into +the configuration of `NeuralNet`. + +<img src="../../images/model-category.png" align="center" width="200px"/> +<span><strong>Figure 1 - Categorization of popular deep learning models.</strong></span> + +## Net structure configuration + +Users configure the `NeuralNet` by listing all layers of the neural net and +specifying each layer's source layer names. Popular deep learning models can be +categorized as Figure 1. The subsequent sections give details for each +category. + +### Feed-forward models + +<div align = "left"> +<img src="../../images/mlp-net.png" align="center" width="200px"/> +<span><strong>Figure 2 - Net structure of a MLP model.</strong></span> +</div> + +Feed-forward models, e.g., CNN and MLP, can easily get configured as their layer +connections are undirected without circles. The +configuration for the MLP model shown in Figure 1 is as follows, + + net { + layer { + name : 'data" + type : kData + } + layer { + name : 'image" + type : kImage + srclayer: 'data' + } + layer { + name : 'label" + type : kLabel + srclayer: 'data' + } + layer { + name : 'hidden" + type : kHidden + srclayer: 'image' + } + layer { + name : 'softmax" + type : kSoftmaxLoss + srclayer: 'hidden' + srclayer: 'label' + } + } + +### Energy models + +<img src="../../images/rbm-rnn.png" align="center" width="500px"/> +<span><strong>Figure 3 - Convert connections in RBM and RNN.</strong></span> + + +For energy models including RBM, DBM, +etc., their connections are undirected (i.e., Category B). To represent these models using +`NeuralNet`, users can simply replace each connection with two directed +connections, as shown in Figure 3a. In other words, for each pair of connected layers, their source +layer field should include each other's name. +The full [RBM example](rbm.html) has +detailed neural net configuration for a RBM model, which looks like + + net { + layer { + name : "vis" + type : kVisLayer + param { + name : "w1" + } + srclayer: "hid" + } + layer { + name : "hid" + type : kHidLayer + param { + name : "w2" + share_from: "w1" + } + srclayer: "vis" + } + } + +### RNN models + +For recurrent neural networks (RNN), users can remove the recurrent connections +by unrolling the recurrent layer. For example, in Figure 3b, the original +layer is unrolled into a new layer with 4 internal layers. In this way, the +model is like a normal feed-forward model, thus can be configured similarly. +The [RNN example](rnn.html) has a full neural net +configuration for a RNN model. + + +## Configuration for multiple nets + +Typically, a training job includes three neural nets for +training, validation and test phase respectively. The three neural nets share most +layers except the data layer, loss layer or output layer, etc.. To avoid +redundant configurations for the shared layers, users can uses the `exclude` +filed to filter a layer in the neural net, e.g., the following layer will be +filtered when creating the testing `NeuralNet`. + + + layer { + ... + exclude : kTest # filter this layer for creating test net + } + + + +## Neural net partitioning + +A neural net can be partitioned in different ways to distribute the training +over multiple workers. + +### Batch and feature dimension + +<img src="../images/partition_fc.png" align="center" width="400px"/> +<span><strong>Figure 4 - Partitioning of a fully connected layer.</strong></span> + + +Every layer's feature blob is considered a matrix whose rows are feature +vectors. Thus, one layer can be split on two dimensions. Partitioning on +dimension 0 (also called batch dimension) slices the feature matrix by rows. +For instance, if the mini-batch size is 256 and the layer is partitioned into 2 +sub-layers, each sub-layer would have 128 feature vectors in its feature blob. +Partitioning on this dimension has no effect on the parameters, as every +[Param](param.html) object is replicated in the sub-layers. Partitioning on dimension +1 (also called feature dimension) slices the feature matrix by columns. For +example, suppose the original feature vector has 50 units, after partitioning +into 2 sub-layers, each sub-layer would have 25 units. This partitioning may +result in [Param](param.html) object being split, as shown in +Figure 4. Both the bias vector and weight matrix are +partitioned into two sub-layers. + + +### Partitioning configuration + +There are 4 partitioning schemes, whose configurations are give below, + + 1. Partitioning each singe layer into sub-layers on batch dimension (see + below). It is enabled by configuring the partition dimension of the layer to + 0, e.g., + + # with other fields omitted + layer { + partition_dim: 0 + } + + 2. Partitioning each singe layer into sub-layers on feature dimension (see + below). It is enabled by configuring the partition dimension of the layer to + 1, e.g., + + # with other fields omitted + layer { + partition_dim: 1 + } + + 3. Partitioning all layers into different subsets. It is enabled by + configuring the location ID of a layer, e.g., + + # with other fields omitted + layer { + location: 1 + } + layer { + location: 0 + } + + + 4. Hybrid partitioning of strategy 1, 2 and 3. The hybrid partitioning is + useful for large models. An example application is to implement the + [idea proposed by Alex](http://arxiv.org/abs/1404.5997). + Hybrid partitioning is configured like, + + # with other fields omitted + layer { + location: 1 + } + layer { + location: 0 + } + layer { + partition_dim: 0 + location: 0 + } + layer { + partition_dim: 1 + location: 0 + } + +Currently SINGA supports strategy-2 well. Other partitioning strategies are +are under test and will be released in later version. + +## Parameter sharing + +Parameters can be shared in two cases, + + * sharing parameters among layers via user configuration. For example, the + visible layer and hidden layer of a RBM shares the weight matrix, which is configured through + the `share_from` field as shown in the above RBM configuration. The + configurations must be the same (except name) for shared parameters. + + * due to neural net partitioning, some `Param` objects are replicated into + different workers, e.g., partitioning one layer on batch dimension. These + workers share parameter values. SINGA controls this kind of parameter + sharing automatically, users do not need to do any configuration. + + * the `NeuralNet` for training and testing (and validation) share most layers + , thus share `Param` values. + +If the shared `Param` instances resident in the same process (may in different +threads), they use the same chunk of memory space for their values. But they +would have different memory spaces for their gradients. In fact, their +gradients will be averaged by the stub or server. + +## Advanced user guide + +### Creation + + static NeuralNet* NeuralNet::Create(const NetProto& np, Phase phase, int num); + +The above function creates a `NeuralNet` for a given phase, and returns a +pointer to the `NeuralNet` instance. The phase is in {kTrain, +kValidation, kTest}. `num` is used for net partitioning which indicates the +number of partitions. Typically, a training job includes three neural nets for +training, validation and test phase respectively. The three neural nets share most +layers except the data layer, loss layer or output layer, etc.. The `Create` +function takes in the full net configuration including layers for training, +validation and test. It removes layers for phases other than the specified +phase based on the `exclude` field in +[layer configuration](layer.html): + + layer { + ... + exclude : kTest # filter this layer for creating test net + } + +The filtered net configuration is passed to the constructor of `NeuralNet`: + + NeuralNet::NeuralNet(NetProto netproto, int npartitions); + +The constructor creates a graph representing the net structure firstly in + + Graph* NeuralNet::CreateGraph(const NetProto& netproto, int npartitions); + +Next, it creates a layer for each node and connects layers if their nodes are +connected. + + void NeuralNet::CreateNetFromGraph(Graph* graph, int npartitions); + +Since the `NeuralNet` instance may be shared among multiple workers, the +`Create` function returns a pointer to the `NeuralNet` instance . + +### Parameter sharing + + `Param` sharing +is enabled by first sharing the Param configuration (in `NeuralNet::Create`) +to create two similar (e.g., the same shape) Param objects, and then calling +(in `NeuralNet::CreateNetFromGraph`), + + void Param::ShareFrom(const Param& from); + +It is also possible to share `Param`s of two nets, e.g., sharing parameters of +the training net and the test net, + + void NeuralNet:ShareParamsFrom(NeuralNet* other); + +It will call `Param::ShareFrom` for each Param object. + +### Access functions +`NeuralNet` provides a couple of access function to get the layers and params +of the net: + + const std::vector<Layer*>& layers() const; + const std::vector<Param*>& params() const ; + Layer* name2layer(string name) const; + Param* paramid2param(int id) const; + + +### Partitioning + + +#### Implementation + +SINGA partitions the neural net in `CreateGraph` function, which creates one +node for each (partitioned) layer. For example, if one layer's partition +dimension is 0 or 1, then it creates `npartition` nodes for it; if the +partition dimension is -1, a single node is created, i.e., no partitioning. +Each node is assigned a partition (or location) ID. If the original layer is +configured with a location ID, then the ID is assigned to each newly created node. +These nodes are connected according to the connections of the original layers. +Some connection layers will be added automatically. +For instance, if two connected sub-layers are located at two +different workers, then a pair of bridge layers is inserted to transfer the +feature (and gradient) blob between them. When two layers are partitioned on +different dimensions, a concatenation layer which concatenates feature rows (or +columns) and a slice layer which slices feature rows (or columns) would be +inserted. These connection layers help making the network communication and +synchronization transparent to the users. + +#### Dispatching partitions to workers + +Each (partitioned) layer is assigned a location ID, based on which it is dispatched to one +worker. Particularly, the pointer to the `NeuralNet` instance is passed +to every worker within the same group, but each worker only computes over the +layers that have the same partition (or location) ID as the worker's ID. When +every worker computes the gradients of the entire model parameters +(strategy-2), we refer to this process as data parallelism. When different +workers compute the gradients of different parameters (strategy-3 or +strategy-1), we call this process model parallelism. The hybrid partitioning +leads to hybrid parallelism where some workers compute the gradients of the +same subset of model parameters while other workers compute on different model +parameters. For example, to implement the hybrid parallelism in for the +[DCNN model](http://arxiv.org/abs/1404.5997), we set `partition_dim = 0` for +lower layers and `partition_dim = 1` for higher layers. Added: incubator/singa/site/trunk/content/markdown/v0.2.0/kr/neuralnet-partition.md URL: http://svn.apache.org/viewvc/incubator/singa/site/trunk/content/markdown/v0.2.0/kr/neuralnet-partition.md?rev=1738695&view=auto ============================================================================== --- incubator/singa/site/trunk/content/markdown/v0.2.0/kr/neuralnet-partition.md (added) +++ incubator/singa/site/trunk/content/markdown/v0.2.0/kr/neuralnet-partition.md Tue Apr 12 06:22:20 2016 @@ -0,0 +1,54 @@ +# Neural Net Partition + +--- + +The purposes of partitioning neural network is to distribute the partitions onto +different working units (e.g., threads or nodes, called workers in this article) +and parallelize the processing. +Another reason for partition is to handle large neural network which cannot be +hold in a single node. For instance, to train models against images with high +resolution we need large neural networks (in terms of training parameters). + +Since *Layer* is the first class citizen in SIGNA, we do the partition against +layers. Specifically, we support partitions at two levels. First, users can configure +the location (i.e., worker ID) of each layer. In this way, users assign one worker +for each layer. Secondly, for one layer, we can partition its neurons or partition +the instances (e.g, images). They are called layer partition and data partition +respectively. We illustrate the two types of partitions using an simple convolutional neural network. + +<img src="../images/conv-mnist.png" style="width: 220px"/> + +The above figure shows a convolutional neural network without any partition. It +has 8 layers in total (one rectangular represents one layer). The first layer is +DataLayer (data) which reads data from local disk files/databases (or HDFS). The second layer +is a MnistLayer which parses the records from MNIST data to get the pixels of a batch +of 8 images (each image is of size 28x28). The LabelLayer (label) parses the records to get the label +of each image in the batch. The ConvolutionalLayer (conv1) transforms the input image to the +shape of 8x27x27. The ReLULayer (relu1) conducts elementwise transformations. The PoolingLayer (pool1) +sub-samples the images. The fc1 layer is fully connected with pool1 layer. It +mulitplies each image with a weight matrix to generate a 10 dimension hidden feature which +is then normalized by a SoftmaxLossLayer to get the prediction. + +<img src="../images/conv-mnist-datap.png" style="width: 1000px"/> + +The above figure shows the convolutional neural network after partitioning all layers +except the DataLayer and ParserLayers, into 3 partitions using data partition. +The read layers process 4 images of the batch, the black and blue layers process 2 images +respectively. Some helper layers, i.e., SliceLayer, ConcateLayer, BridgeSrcLayer, +BridgeDstLayer and SplitLayer, are added automatically by our partition algorithm. +Layers of the same color resident in the same worker. There would be data transferring +across different workers at the boundary layers (i.e., BridgeSrcLayer and BridgeDstLayer), +e.g., between s-slice-mnist-conv1 and d-slice-mnist-conv1. + +<img src="../images/conv-mnist-layerp.png" style="width: 1000px"/> + +The above figure shows the convolutional neural network after partitioning all layers +except the DataLayer and ParserLayers, into 2 partitions using layer partition. We can +see that each layer processes all 8 images from the batch. But different partitions process +different part of one image. For instance, the layer conv1-00 process only 4 channels. The other +4 channels are processed by conv1-01 which residents in another worker. + + +Since the partition is done at the layer level, we can apply different partitions for +different layers to get a hybrid partition for the whole neural network. Moreover, +we can also specify the layer locations to locate different layers to different workers. Added: incubator/singa/site/trunk/content/markdown/v0.2.0/kr/overview.md URL: http://svn.apache.org/viewvc/incubator/singa/site/trunk/content/markdown/v0.2.0/kr/overview.md?rev=1738695&view=auto ============================================================================== --- incubator/singa/site/trunk/content/markdown/v0.2.0/kr/overview.md (added) +++ incubator/singa/site/trunk/content/markdown/v0.2.0/kr/overview.md Tue Apr 12 06:22:20 2016 @@ -0,0 +1,70 @@ +# ê°ì + +--- + +SINGAë ëê·ëª¨ ë°ì´í° ë¶ìì ìí ë¥ë¬ë 모ë¸ì í¸ë ì´ëì 목ì ì¼ë¡ í "ë¶ì° ë¥ë¬ë íë«í¼" ì ëë¤. 모ë¸ì´ ëë ë´ë´ë¤í¸ìí¬ì "Layer" ê°ë ì ë°ë¼ì ì§ê´ì ì¸ íë¡ê·¸ëë°ì í ì ìëë¡ ëìì¸ëì´ ììµëë¤. + +* Convolutional Neural Network ì ê°ì í¼ëí¬ìë ë¤í¸ìí¬ì Restricted Boltzmann Machine ê³¼ ê°ì ìëì§ ëª¨ë¸, Recurrent Neural Network ëª¨ë¸ ë± ë¤ìí 모ë¸ì ì§ìí©ëë¤. + +* ë¤ìí 기ë¥ì ê°ì§ë "Layer"ë¤ì´ Built-in Layer ë¡ ì¤ë¹ëì´ ììµëë¤. + +* SINGA ìí¤í ì²ë synchronous (ë기), asynchronous (ë¹ë기), ê·¸ë¦¬ê³ hybrid (íì´ë¸ë¦¬ë) í¸ë ì´ëì í ì ìëë¡ ì¤ê³ëì´ ììµëë¤. + +* ëí 모ë¸ì í¸ë ì´ëì ë³ë ¬ííë ë¤ìí partition ì¤í´ (ë°°ì¹ ë° í¹ì§ ë¶í )ì ì§ìí©ëë¤. + + +## 목ì + +íì¥ì± : ë¶ì° ìì¤í ì¼ë¡ì¨ ë ë§ì ììì ì´ì©íì¬ í¹ì ì ë°ëì ëë¬ í ëê¹ì§ í¸ë ì´ë ìë를 í¥ììí¨ë¤. + +ì ì©ì± : ëê·ëª¨ ë¶ì° 모ë¸ì í¨ì¨ì ì¸ í¸ë ì´ëì íìí ë°ì´í°ì 모ë¸ì ë¶í , ë¤í¸ìí¬ íµì ë± íë¡ê·¸ë머ì ìì ì ë¨ìííê³ , ë³µì¡í ëª¨ë¸ ë° ìê³ ë¦¬ì¦ì 구ì¶ì ì½ê² íë¤. + + +## ì¤ê³ ì´ë + +íì¥ì±ì ë¶ì° ë¥ë¬ëìì ì¤ìí ì°êµ¬ ê³¼ì ì ëë¤. +SINGAë ë¤ìí í¸ë ì´ë íë ììí¬ì íì¥ì±ì ì ì§í ì ìëë¡ ì¤ê³ëì´ ììµëë¤. + +* Synchronous (ë기) : í¸ë ì´ëì 1ë¨ê³ìì ì»ì ììë í¨ê³¼ë¥¼ ëì ëë¤. + +* Asynchronous (ë¹ë기) : í¸ë ì´ëì ìë ´ ìë를 í¥ììíµëë¤. + +* Hybrid (íì´ë¸ë¦¬ë) : ì½ì¤í¸ ë° ë¦¬ìì¤ (í´ë¬ì¤í° í¬ê¸° ë±)ì ë§ë í¨ê³¼ì ìë ´ ìëì ê· íì ì¡ê³ íì¥ì±ì í¥ììíµëë¤. + +SINGAë ë¥ë¬ë 모ë¸ì ë¤í¸ìí¬ "ë ì´ì´" ê°ë ì ë°ë¼ ì§ê´ì ì¼ë¡ íë¡ê·¸ëë°ì í ì ìëë¡ ëìì¸ëì´ ììµëë¤. ë¤ìí 모ë¸ì ì½ê² 구ì¶íê³ í¸ë ì´ë í ì ììµëë¤. + +## ìì¤í ê°ì + +<img src = "../../images/sgd.png" align="center" width="400px"/> +<span><strong> Figure 1 - SGD Flow </strong></span> + +"ë¥ë¬ë 모ë¸ì íìµíë¤"ë ê²ì í¹ì ìì (ë¶ë¥, ì측 ë±)ì ë¬ì±í기 ìíì¬ ì¬ì©ëë í¹ì§ë(feature)ì ìì±íë ë³í í¨ìì ìµì íë¼ë¯¸í°ë¥¼ ì°¾ëë¤ë ê²ì ëë¤. +ê·¸ ë³ìì ì¢ê³ ëì¨ì, Cross-Entropy Loss (https://en.wikipedia.org/wiki/Cross_entropy) ë±ì loss function (ìì¤ í¨ì)ì¼ë¡ íì¸í©ëë¤. ì´ í¨ìë ì¼ë°ì ì¼ë¡ ë¹ì í ëë ë¹ ë³¼ë¡ í¨ìì´ë¯ë¡ éè§£ì ì°¾ê¸°ê° íëëë¤. + +ê·¸ëì Stochastic Gradient Descent (íë¥ ì 구배ê°íë²)ì ì´ì©í©ëë¤. +Figure 1ê³¼ ê°ì´ ëë¤ì¼ë¡ ì´ê¸°í ë íë¼ë¯¸í° ê°ì, ìì¤ í¨ìê° ìì ì§ëë¡ ë°ë³µ ì ë°ì´í¸íê³ ììµëë¤. + +<img src = "../../images/overview.png" align="center" width="400px"/> +<span> <strong> Figure 2 - SINGA Overview </strong> </span> + +í¸ë ì´ëì íìí ìí¬ë¡ëë workers ì servers ì ë¶ì°ë©ëë¤. +Figure 2ì ê°ì´ 루í(iteration)ë§ë¤ workers ë *TrainOneBatch* í¨ì를 ë¶ë¬ì íë¼ë¯¸í° 구배를 ê³ì°í©ëë¤. +*TrainOneBatch* ë ë´ë´ë¤í¸ìí¬ì êµ¬ì¡°ê° ê¸°ì ë *NeuralNet* ì ë³´ì ë°ë¼ì "Layer"를 ì°¨ë¡ë¡ ëë¬ë´ ëë¤. +ê³ì° ë 구배ë ë¡ì»¬ë ¸ëì stub ì ë³´ë´ì ¸ ì§ê³ ë í, í´ë¹ servers ì ì ì¡ë©ëë¤. Servers ë ì ë°ì´í¸ ë íë¼ë¯¸í° ê°ì workers íí ëëë ¤ì£¼ê³ , ë¤ì 루í(iteration)를 ì§íí©ëë¤. + + +## Job + +SINGAìì "Job"ì´ë ë´ë´ë¤í¸ìí¬ ëª¨ë¸ê³¼ ë°ì´í° í¸ë ì´ë ë°©ë², í´ë¬ì¤í° í í´ë¡ì§ ë±ì´ 기ì ë "Job Configuration"ì ë§í©ëë¤. +Job configuration ì Figure 2ì ê·¸ë ¤ì§ ë¤ìì 4ê°ì§ ìì를 ê°ì§ëë¤. + +* [NeuralNet](neural-net.html) : ë´ë´ë¤í¸ìí¬ì 구조ì ê° "ë ì´ì´"ì ì¤ì ì 기ì í©ëë¤. +* [TrainOneBatch](train-one-batch.html) : ëª¨ë¸ ì¹´í ê³ ë¦¬ì ì í©í ìê³ ë¦¬ì¦ì 기ì í©ëë¤. +* [Updater](updater.html) : serverìì ë§¤ê° ë³ì를 ì ë°ì´í¸íë ë°©ë²ì 기ì í©ëë¤. +* [Cluster Topology](distributed-training.html) : workersì servers ë¶ì° í í´ë¡ì§ë¥¼ 기ì í©ëë¤. + +[main í¨ì](programming-guide.html)ì SINGA ëë¼ì´ë²ë¥¼ ì¨ì Job ì ëê¹ëë¤. + +ì´ íë¡ì¸ì¤ë Hadoopììì Job ìë¸ë¯¸ì ê³¼ ë¹ì·í©ëë¤. +ì ì ê° main í¨ììì ìì ì¤ì ì í©ëë¤. +Hadoop ì ì ë ìì ì mapperì reducer를 ì¤ì íì§ë§ SINGA ììë ì ì ì Layer ë Updater ë±ì ì¤ì í©ëë¤. Added: incubator/singa/site/trunk/content/markdown/v0.2.0/kr/param.md URL: http://svn.apache.org/viewvc/incubator/singa/site/trunk/content/markdown/v0.2.0/kr/param.md?rev=1738695&view=auto ============================================================================== --- incubator/singa/site/trunk/content/markdown/v0.2.0/kr/param.md (added) +++ incubator/singa/site/trunk/content/markdown/v0.2.0/kr/param.md Tue Apr 12 06:22:20 2016 @@ -0,0 +1,226 @@ +# Parameters + +--- + +A `Param` object in SINGA represents a set of parameters, e.g., a weight matrix +or a bias vector. *Basic user guide* describes how to configure for a `Param` +object, and *Advanced user guide* provides details on implementing users' +parameter initialization methods. + +## Basic user guide + +The configuration of a Param object is inside a layer configuration, as the +`Param` are associated with layers. An example configuration is like + + layer { + ... + param { + name : "p1" + init { + type : kConstant + value: 1 + } + } + } + +The [SGD algorithm](overview.html) starts with initializing all +parameters according to user specified initialization method (the `init` field). +For the above example, +all parameters in `Param` "p1" will be initialized to constant value 1. The +configuration fields of a Param object is defined in [ParamProto](../api/classsinga_1_1ParamProto.html): + + * name, an identifier string. It is an optional field. If not provided, SINGA + will generate one based on layer name and its order in the layer. + * init, field for setting initialization methods. + * share_from, name of another `Param` object, from which this `Param` will share + configurations and values. + * lr_scale, float value to be multiplied with the learning rate when + [updating the parameters](updater.html) + * wd_scale, float value to be multiplied with the weight decay when + [updating the parameters](updater.html) + +There are some other fields that are specific to initialization methods. + +### Initialization methods + +Users can set the `type` of `init` use the following built-in initialization +methods, + + * `kConst`, set all parameters of the Param object to a constant value + + type: kConst + value: float # default is 1 + + * `kGaussian`, initialize the parameters following a Gaussian distribution. + + type: kGaussian + mean: float # mean of the Gaussian distribution, default is 0 + std: float # standard variance, default is 1 + value: float # default 0 + + * `kUniform`, initialize the parameters following an uniform distribution + + type: kUniform + low: float # lower boundary, default is -1 + high: float # upper boundary, default is 1 + value: float # default 0 + + * `kGaussianSqrtFanIn`, initialize `Param` objects with two dimensions (i.e., + matrix) using `kGaussian` and then + multiple each parameter with `1/sqrt(fan_in)`, where`fan_in` is the number of + columns of the matrix. + + * `kUniformSqrtFanIn`, the same as `kGaussianSqrtFanIn` except that the + distribution is an uniform distribution. + + * `kUniformFanInOut`, initialize matrix `Param` objects using `kUniform` and then + multiple each parameter with `sqrt(6/(fan_in + fan_out))`, where`fan_in + + fan_out` sums up the number of columns and rows of the matrix. + +For all above initialization methods except `kConst`, if their `value` is not +1, every parameter will be multiplied with `value`. Users can also implement +their own initialization method following the *Advanced user guide*. + + +## Advanced user guide + +This sections describes the details on implementing new parameter +initialization methods. + +### Base ParamGenerator +All initialization methods are implemented as +subclasses of the base `ParamGenerator` class. + + class ParamGenerator { + public: + virtual void Init(const ParamGenProto&); + void Fill(Param*); + + protected: + ParamGenProto proto_; + }; + +Configurations of the initialization method is in `ParamGenProto`. The `Fill` +function fills the `Param` object (passed in as an argument). + +### New ParamGenerator subclass + +Similar to implement a new Layer subclass, users can define a configuration +protocol message, + + # in user.proto + message FooParamProto { + optional int32 x = 1; + } + extend ParamGenProto { + optional FooParamProto fooparam_conf =101; + } + +The configuration of `Param` would be + + param { + ... + init { + user_type: 'FooParam" # must use user_type for user defined methods + [fooparam_conf] { # must use brackets for configuring user defined messages + x: 10 + } + } + } + +The subclass could be declared as, + + class FooParamGen : public ParamGenerator { + public: + void Fill(Param*) override; + }; + +Users can access the configuration fields in `Fill` by + + int x = proto_.GetExtension(fooparam_conf).x(); + +To use the new initialization method, users need to register it in the +[main function](programming-guide.html). + + driver.RegisterParamGenerator<FooParamGen>("FooParam") # must be consistent with the user_type in configuration + +{% comment %} +### Base Param class + +### Members + + int local_version_; + int slice_start_; + vector<int> slice_offset_, slice_size_; + + shared_ptr<Blob<float>> data_; + Blob<float> grad_; + ParamProto proto_; + +Each Param object has a local version and a global version (inside the data +Blob). These two versions are used for synchronization. If multiple Param +objects share the same values, they would have the same `data_` field. +Consequently, their global version is the same. The global version is updated +by [the stub thread](communication.html). The local version is +updated in `Worker::Update` function which assigns the global version to the +local version. The `Worker::Collect` function is blocked until the global +version is larger than the local version, i.e., when `data_` is updated. In +this way, we synchronize workers sharing parameters. + +In Deep learning models, some Param objects are 100 times larger than others. +To ensure the load-balance among servers, SINGA slices large Param objects. The +slicing information is recorded by `slice_*`. Each slice is assigned a unique +ID starting from 0. `slice_start_` is the ID of the first slice of this Param +object. `slice_offset_[i]` is the offset of the i-th slice in this Param +object. `slice_size_[i]` is the size of the i-th slice. These slice information +is used to create messages for transferring parameter values or gradients to +different servers. + +Each Param object has a `grad_` field for gradients. Param objects do not share +this Blob although they may share `data_`. Because each layer containing a +Param object would contribute gradients. E.g., in RNN, the recurrent layers +share parameters values, and the gradients used for updating are averaged from all recurrent +these recurrent layers. In SINGA, the stub thread will aggregate local +gradients for the same Param object. The server will do a global aggregation +of gradients for the same Param object. + +The `proto_` field has some meta information, e.g., name and ID. It also has a +field called `owner` which is the ID of the Param object that shares parameter +values with others. + +### Functions +The base Param class implements two sets of functions, + + virtual void InitValues(int version = 0); // initialize values according to `init_method` + void ShareFrom(const Param& other); // share `data_` from `other` Param + -------------- + virtual Msg* GenGetMsg(bool copy, int slice_idx); + virtual Msg* GenPutMsg(bool copy, int slice_idx); + ... // other message related functions. + +Besides the functions for processing the parameter values, there is a set of +functions for generating and parsing messages. These messages are for +transferring parameter values or gradients between workers and servers. Each +message corresponds to one Param slice. If `copy` is false, it means the +receiver of this message is in the same process as the sender. In such case, +only pointers to the memory of parameter value (or gradient) are wrapped in +the message; otherwise, the parameter values (or gradients) should be copied +into the message. + + +## Implementing Param subclass +Users can extend the base Param class to implement their own parameter +initialization methods and message transferring protocols. Similar to +implementing a new Layer subclasses, users can create google protocol buffer +messages for configuring the Param subclass. The subclass, denoted as FooParam +should be registered in main.cc, + + dirver.RegisterParam<FooParam>(kFooParam); // kFooParam should be different to 0, which is for the base Param type + + + * type, an integer representing the `Param` type. Currently SINGA provides one + `Param` implementation with type 0 (the default type). If users want + to use their own Param implementation, they should extend the base Param + class and configure this field with `kUserParam` + +{% endcomment %} Added: incubator/singa/site/trunk/content/markdown/v0.2.0/kr/programmer-guide.md URL: http://svn.apache.org/viewvc/incubator/singa/site/trunk/content/markdown/v0.2.0/kr/programmer-guide.md?rev=1738695&view=auto ============================================================================== --- incubator/singa/site/trunk/content/markdown/v0.2.0/kr/programmer-guide.md (added) +++ incubator/singa/site/trunk/content/markdown/v0.2.0/kr/programmer-guide.md Tue Apr 12 06:22:20 2016 @@ -0,0 +1,97 @@ +# Programmer Guide + +--- + +To submit a training job, users must provide the configuration of the +four components shown in Figure 1: + + * a [NeuralNet](neural-net.html) describing the neural net structure with the detailed layer setting and their connections; + * a [TrainOneBatch](train-one-batch.html) algorithm which is tailored for different model categories; + * an [Updater](updater.html) defining the protocol for updating parameters at the server side; + * a [Cluster Topology](distributed-training.html) specifying the distributed architecture of workers and servers. + +The *Basic user guide* section describes how to submit a training job using +built-in components; while the *Advanced user guide* section presents details +on writing user's own main function to register components implemented by +themselves. In addition, the training data must be prepared, which has the same +[process](data.html) for both advanced users and basic users. + +<img src="../../images/overview.png" align="center" width="400px"/> +<span><strong>Figure 1 - SINGA overview.</strong></span> + + + +## Basic user guide + +Users can use the default main function provided by SINGA to submit the training +job. For this case, a job configuration file written as a google protocol +buffer message for the [JobProto](../api/classsinga_1_1JobProto.html) must be provided in the command line, + + ./bin/singa-run.sh -conf <path to job conf> [-resume] [-test] + +* `-resume` is for continuing the training from last [checkpoint](checkpoint.html). +* `-test` is for testing the performance of previously trained model and extracting features for new data, +more details are available [here](test.html). + +The [MLP](mlp.html) and [CNN](cnn.html) +examples use built-in components. Please read the corresponding pages for their +job configuration files. The subsequent pages will illustrate the details on +each component of the configuration. + +## Advanced user guide + +If a user's model contains some user-defined components, e.g., +[Updater](updater.html), he has to write a main function to +register these components. It is similar to Hadoop's main function. Generally, +the main function should + + * initialize SINGA, e.g., setup logging. + + * register user-defined components. + + * create and pass the job configuration to SINGA driver + +An example main function is like + + #include <string> + #include "singa.h" + #include "user.h" // header for user code + + int main(int argc, char** argv) { + singa::Driver driver; + driver.Init(argc, argv); + bool resume; + // parse resume option from argv. + + // register user defined layers + driver.RegisterLayer<FooLayer, std::string>("kFooLayer"); + // register user defined updater + driver.RegisterUpdater<FooUpdater, std::string>("kFooUpdater"); + ... + auto jobConf = driver.job_conf(); + // update jobConf + + driver.Submit(resume, jobConf); + return 0; + } + +The Driver class' `Init` method will load a job configuration file provided by +users as a command line argument (`-conf <job conf>`). It contains at least the +cluster topology and returns the `jobConf` for users to update or fill in +configurations of neural net, updater, etc. If users define subclasses of +Layer, Updater, Worker and Param, they should register them through the driver. +Finally, the job configuration is submitted to the driver which starts the +training. + +We will provide helper functions to make the configuration easier in the +future, like [keras](https://github.com/fchollet/keras). + +Users need to compile and link their code (e.g., layer implementations and the main +file) with SINGA library (*.libs/libsinga.so*) to generate an +executable file, e.g., with name *mysinga*. To launch the program, users just pass the +path of the *mysinga* and base job configuration to *./bin/singa-run.sh*. + + ./bin/singa-run.sh -conf <path to job conf> -exec <path to mysinga> [other arguments] + +The [RNN application](rnn.html) provides a full example of +implementing the main function for training a specific RNN model.
