Added: incubator/singa/site/trunk/content/markdown/docs/kr/model-config.md URL: http://svn.apache.org/viewvc/incubator/singa/site/trunk/content/markdown/docs/kr/model-config.md?rev=1724348&view=auto ============================================================================== --- incubator/singa/site/trunk/content/markdown/docs/kr/model-config.md (added) +++ incubator/singa/site/trunk/content/markdown/docs/kr/model-config.md Wed Jan 13 03:46:19 2016 @@ -0,0 +1,294 @@ +# Model Configuration + +--- + +SINGA uses the stochastic gradient descent (SGD) algorithm to train parameters +of deep learning models. For each SGD iteration, there is a +[Worker](architecture.html) computing +gradients of parameters from the NeuralNet and a [Updater]() updating parameter +values based on gradients. Hence the model configuration mainly consists these +three parts. We will introduce the NeuralNet, Worker and Updater in the +following paragraphs and describe the configurations for them. All model +configuration is specified in the model.conf file in the user provided +workspace folder. E.g., the [cifar10 example folder](https://github.com/apache/incubator-singa/tree/master/examples/cifar10) +has a model.conf file. + + +## NeuralNet + +### Uniform model (neuralnet) representation + +<img src = "../images/model-categorization.png" style = "width: 400px"> Fig. 1: +Deep learning model categorization</img> + +Many deep learning models have being proposed. Fig. 1 is a categorization of +popular deep learning models based on the layer connections. The +[NeuralNet](https://github.com/apache/incubator-singa/blob/master/include/neuralnet/neuralnet.h) +abstraction of SINGA consists of multiple directly connected layers. This +abstraction is able to represent models from all the three categorizations. + + * For the feed-forward models, their connections are already directed. + + * For the RNN models, we unroll them into directed connections, as shown in + Fig. 2. + + * For the undirected connections in RBM, DBM, etc., we replace each undirected + connection with two directed connection, as shown in Fig. 3. + +<div style = "height: 200px"> +<div style = "float:left; text-align: center"> +<img src = "../images/unroll-rbm.png" style = "width: 280px"> <br/>Fig. 2: Unroll RBM </img> +</div> +<div style = "float:left; text-align: center; margin-left: 40px"> +<img src = "../images/unroll-rnn.png" style = "width: 550px"> <br/>Fig. 3: Unroll RNN </img> +</div> +</div> + +In specific, the NeuralNet class is defined in +[neuralnet.h](https://github.com/apache/incubator-singa/blob/master/include/neuralnet/neuralnet.h) : + + ... + vector<Layer*> layers_; + ... + +The Layer class is defined in +[base_layer.h](https://github.com/apache/incubator-singa/blob/master/include/neuralnet/base_layer.h): + + vector<Layer*> srclayers_, dstlayers_; + LayerProto layer_proto_; // layer configuration, including meta info, e.g., name + ... + + +The connection with other layers are kept in the `srclayers_` and `dstlayers_`. +Since there are many different feature transformations, there are many +different Layer implementations correspondingly. For layers that have +parameters in their feature transformation functions, they would have Param +instances in the layer class, e.g., + + Param weight; + + +### Configure the structure of a NeuralNet instance + +To train a deep learning model, the first step is to write the configurations +for the model structure, i.e., the layers and connections for the NeuralNet. +Like [Caffe](http://caffe.berkeleyvision.org/), we use the [Google Protocol +Buffer](https://developers.google.com/protocol-buffers/) to define the +configuration protocol. The +[NetProto](https://github.com/apache/incubator-singa/blob/master/src/proto/model.proto) +specifies the configuration fields for a NeuralNet instance, + +message NetProto { + repeated LayerProto layer = 1; + ... +} + +The configuration is then + + layer { + // layer configuration + } + layer { + // layer configuration + } + ... + +To configure the model structure, we just configure each layer involved in the model. + + message LayerProto { + // the layer name used for identification + required string name = 1; + // source layer names + repeated string srclayers = 3; + // parameters, e.g., weight matrix or bias vector + repeated ParamProto param = 12; + // the layer type from the enum above + required LayerType type = 20; + // configuration for convolution layer + optional ConvolutionProto convolution_conf = 30; + // configuration for concatenation layer + optional ConcateProto concate_conf = 31; + // configuration for dropout layer + optional DropoutProto dropout_conf = 33; + ... + } + +A sample configuration for a feed-forward model is like + + layer { + name : "input" + type : kRecordInput + } + layer { + name : "conv" + type : kInnerProduct + srclayers : "input" + param { + // configuration for parameter + } + innerproduct_conf { + // configuration for this specific layer + } + ... + } + +The layer type list is defined in +[LayerType](https://github.com/apache/incubator-singa/blob/master/src/proto/model.proto). +One type (kFoo) corresponds to one child class of Layer (FooLayer) and one +configuration field (foo_conf). All built-in layers are introduced in the [layer page](layer.html). + +## Worker + +At the beginning, the Work will initialize the values of Param instances of +each layer either randomly (according to user configured distribution) or +loading from a [checkpoint file](). For each training iteration, the worker +visits layers of the neural network to compute gradients of Param instances of +each layer. Corresponding to the three categories of models, there are three +different algorithm to compute the gradients of a neural network. + + 1. Back-propagation (BP) for feed-forward models + 2. Back-propagation through time (BPTT) for recurrent neural networks + 3. Contrastive divergence (CD) for RBM, DBM, etc models. + +SINGA has provided these three algorithms as three Worker implementations. +Users only need to configure in the model.conf file to specify which algorithm +should be used. The configuration protocol is + + message ModelProto { + ... + enum GradCalcAlg { + // BP algorithm for feed-forward models, e.g., CNN, MLP, RNN + kBP = 1; + // BPTT for recurrent neural networks + kBPTT = 2; + // CD algorithm for RBM, DBM etc., models + kCd = 3; + } + // gradient calculation algorithm + required GradCalcAlg alg = 8 [default = kBackPropagation]; + ... + } + +These algorithms override the TrainOneBatch function of the Worker. E.g., the +BPWorker implements it as + + void BPWorker::TrainOneBatch(int step, Metric* perf) { + Forward(step, kTrain, train_net_, perf); + Backward(step, train_net_); + } + +The Forward function passes the raw input features of one mini-batch through +all layers, and the Backward function visits the layers in reverse order to +compute the gradients of the loss w.r.t each layer's feature and each layer's +Param objects. Different algorithms would visit the layers in different orders. +Some may traverses the neural network multiple times, e.g., the CDWorker's +TrainOneBatch function is: + + void CDWorker::TrainOneBatch(int step, Metric* perf) { + PostivePhase(step, kTrain, train_net_, perf); + NegativePhase(step, kTran, train_net_, perf); + GradientPhase(step, train_net_); + } + +Each `*Phase` function would visit all layers one or multiple times. +All algorithms will finally call two functions of the Layer class: + + /** + * Transform features from connected layers into features of this layer. + * + * @param phase kTrain, kTest, kPositive, etc. + */ + virtual void ComputeFeature(Phase phase, Metric* perf) = 0; + /** + * Compute gradients for parameters (and connected layers). + * + * @param phase kTrain, kTest, kPositive, etc. + */ + virtual void ComputeGradient(Phase phase) = 0; + +All [Layer implementations]() must implement the above two functions. + + +## Updater + +Once the gradients of parameters are computed, the Updater will update +parameter values. There are many SGD variants for updating parameters, like +[AdaDelta](http://arxiv.org/pdf/1212.5701v1.pdf), +[AdaGrad](http://www.magicbroom.info/Papers/DuchiHaSi10.pdf), +[RMSProp](http://www.cs.toronto.edu/~tijmen/csc321/slides/lecture_slides_lec6.pdf), +[Nesterov](http://scholar.google.com/citations?view_op=view_citation&hl=en&user=DJ8Ep8YAAAAJ&citation_for_view=DJ8Ep8YAAAAJ:hkOj_22Ku90C) +and SGD with momentum. The core functions of the Updater is + + /** + * Update parameter values based on gradients + * @param step training step + * @param param pointer to the Param object + * @param grad_scale scaling factor for the gradients + */ + void Update(int step, Param* param, float grad_scale=1.0f); + /** + * @param step training step + * @return the learning rate for this step + */ + float GetLearningRate(int step); + +SINGA provides several built-in updaters and learning rate change methods. +Users can configure them according to the UpdaterProto + + message UpdaterProto { + enum UpdaterType{ + // noraml SGD with momentum and weight decay + kSGD = 1; + // adaptive subgradient, http://www.magicbroom.info/Papers/DuchiHaSi10.pdf + kAdaGrad = 2; + // http://www.cs.toronto.edu/~tijmen/csc321/slides/lecture_slides_lec6.pdf + kRMSProp = 3; + // Nesterov first optimal gradient method + kNesterov = 4; + } + // updater type + required UpdaterType type = 1 [default=kSGD]; + // configuration for RMSProp algorithm + optional RMSPropProto rmsprop_conf = 50; + + enum ChangeMethod { + kFixed = 0; + kInverseT = 1; + kInverse = 2; + kExponential = 3; + kLinear = 4; + kStep = 5; + kFixedStep = 6; + } + // change method for learning rate + required ChangeMethod lr_change= 2 [default = kFixed]; + + optional FixedStepProto fixedstep_conf=40; + ... + optional float momentum = 31 [default = 0]; + optional float weight_decay = 32 [default = 0]; + // base learning rate + optional float base_lr = 34 [default = 0]; + } + + +## Other model configuration fields + +Some other important configuration fields for training a deep learning model is +listed: + + // model name, e.g., "cifar10-dcnn", "mnist-mlp" + string name; + // displaying training info for every this number of iterations, default is 0 + int32 display_freq; + // total num of steps/iterations for training + int32 train_steps; + // do test for every this number of training iterations, default is 0 + int32 test_freq; + // run test for this number of steps/iterations, default is 0. + // The test dataset has test_steps * batchsize instances. + int32 test_steps; + // do checkpoint for every this number of training steps, default is 0 + int32 checkpoint_freq; + +The pages of [checkpoint and restore](checkpoint.html) has details on checkpoint related fields.
Added: incubator/singa/site/trunk/content/markdown/docs/kr/neural-net.md URL: http://svn.apache.org/viewvc/incubator/singa/site/trunk/content/markdown/docs/kr/neural-net.md?rev=1724348&view=auto ============================================================================== --- incubator/singa/site/trunk/content/markdown/docs/kr/neural-net.md (added) +++ incubator/singa/site/trunk/content/markdown/docs/kr/neural-net.md Wed Jan 13 03:46:19 2016 @@ -0,0 +1,327 @@ +# Neural Net + +--- + +`NeuralNet` in SINGA represents an instance of user's neural net model. As the +neural net typically consists of a set of layers, `NeuralNet` comprises +a set of unidirectionally connected [Layer](layer.html)s. +This page describes how to convert an user's neural net into +the configuration of `NeuralNet`. + +<img src="../images/model-category.png" align="center" width="200px"/> +<span><strong>Figure 1 - Categorization of popular deep learning models.</strong></span> + +## Net structure configuration + +Users configure the `NeuralNet` by listing all layers of the neural net and +specifying each layer's source layer names. Popular deep learning models can be +categorized as Figure 1. The subsequent sections give details for each +category. + +### Feed-forward models + +<div align = "left"> +<img src="../images/mlp-net.png" align="center" width="200px"/> +<span><strong>Figure 2 - Net structure of a MLP model.</strong></span> +</div> + +Feed-forward models, e.g., CNN and MLP, can easily get configured as their layer +connections are undirected without circles. The +configuration for the MLP model shown in Figure 1 is as follows, + + net { + layer { + name : 'data" + type : kData + } + layer { + name : 'image" + type : kImage + srclayer: 'data' + } + layer { + name : 'label" + type : kLabel + srclayer: 'data' + } + layer { + name : 'hidden" + type : kHidden + srclayer: 'image' + } + layer { + name : 'softmax" + type : kSoftmaxLoss + srclayer: 'hidden' + srclayer: 'label' + } + } + +### Energy models + +<img src="../images/rbm-rnn.png" align="center" width="500px"/> +<span><strong>Figure 3 - Convert connections in RBM and RNN.</strong></span> + + +For energy models including RBM, DBM, +etc., their connections are undirected (i.e., Category B). To represent these models using +`NeuralNet`, users can simply replace each connection with two directed +connections, as shown in Figure 3a. In other words, for each pair of connected layers, their source +layer field should include each other's name. +The full [RBM example](rbm.html) has +detailed neural net configuration for a RBM model, which looks like + + net { + layer { + name : "vis" + type : kVisLayer + param { + name : "w1" + } + srclayer: "hid" + } + layer { + name : "hid" + type : kHidLayer + param { + name : "w2" + share_from: "w1" + } + srclayer: "vis" + } + } + +### RNN models + +For recurrent neural networks (RNN), users can remove the recurrent connections +by unrolling the recurrent layer. For example, in Figure 3b, the original +layer is unrolled into a new layer with 4 internal layers. In this way, the +model is like a normal feed-forward model, thus can be configured similarly. +The [RNN example](rnn.html) has a full neural net +configuration for a RNN model. + + +## Configuration for multiple nets + +Typically, a training job includes three neural nets for +training, validation and test phase respectively. The three neural nets share most +layers except the data layer, loss layer or output layer, etc.. To avoid +redundant configurations for the shared layers, users can uses the `exclude` +filed to filter a layer in the neural net, e.g., the following layer will be +filtered when creating the testing `NeuralNet`. + + + layer { + ... + exclude : kTest # filter this layer for creating test net + } + + + +## Neural net partitioning + +A neural net can be partitioned in different ways to distribute the training +over multiple workers. + +### Batch and feature dimension + +<img src="../images/partition_fc.png" align="center" width="400px"/> +<span><strong>Figure 4 - Partitioning of a fully connected layer.</strong></span> + + +Every layer's feature blob is considered a matrix whose rows are feature +vectors. Thus, one layer can be split on two dimensions. Partitioning on +dimension 0 (also called batch dimension) slices the feature matrix by rows. +For instance, if the mini-batch size is 256 and the layer is partitioned into 2 +sub-layers, each sub-layer would have 128 feature vectors in its feature blob. +Partitioning on this dimension has no effect on the parameters, as every +[Param](param.html) object is replicated in the sub-layers. Partitioning on dimension +1 (also called feature dimension) slices the feature matrix by columns. For +example, suppose the original feature vector has 50 units, after partitioning +into 2 sub-layers, each sub-layer would have 25 units. This partitioning may +result in [Param](param.html) object being split, as shown in +Figure 4. Both the bias vector and weight matrix are +partitioned into two sub-layers. + + +### Partitioning configuration + +There are 4 partitioning schemes, whose configurations are give below, + + 1. Partitioning each singe layer into sub-layers on batch dimension (see + below). It is enabled by configuring the partition dimension of the layer to + 0, e.g., + + # with other fields omitted + layer { + partition_dim: 0 + } + + 2. Partitioning each singe layer into sub-layers on feature dimension (see + below). It is enabled by configuring the partition dimension of the layer to + 1, e.g., + + # with other fields omitted + layer { + partition_dim: 1 + } + + 3. Partitioning all layers into different subsets. It is enabled by + configuring the location ID of a layer, e.g., + + # with other fields omitted + layer { + location: 1 + } + layer { + location: 0 + } + + + 4. Hybrid partitioning of strategy 1, 2 and 3. The hybrid partitioning is + useful for large models. An example application is to implement the + [idea proposed by Alex](http://arxiv.org/abs/1404.5997). + Hybrid partitioning is configured like, + + # with other fields omitted + layer { + location: 1 + } + layer { + location: 0 + } + layer { + partition_dim: 0 + location: 0 + } + layer { + partition_dim: 1 + location: 0 + } + +Currently SINGA supports strategy-2 well. Other partitioning strategies are +are under test and will be released in later version. + +## Parameter sharing + +Parameters can be shared in two cases, + + * sharing parameters among layers via user configuration. For example, the + visible layer and hidden layer of a RBM shares the weight matrix, which is configured through + the `share_from` field as shown in the above RBM configuration. The + configurations must be the same (except name) for shared parameters. + + * due to neural net partitioning, some `Param` objects are replicated into + different workers, e.g., partitioning one layer on batch dimension. These + workers share parameter values. SINGA controls this kind of parameter + sharing automatically, users do not need to do any configuration. + + * the `NeuralNet` for training and testing (and validation) share most layers + , thus share `Param` values. + +If the shared `Param` instances resident in the same process (may in different +threads), they use the same chunk of memory space for their values. But they +would have different memory spaces for their gradients. In fact, their +gradients will be averaged by the stub or server. + +## Advanced user guide + +### Creation + + static NeuralNet* NeuralNet::Create(const NetProto& np, Phase phase, int num); + +The above function creates a `NeuralNet` for a given phase, and returns a +pointer to the `NeuralNet` instance. The phase is in {kTrain, +kValidation, kTest}. `num` is used for net partitioning which indicates the +number of partitions. Typically, a training job includes three neural nets for +training, validation and test phase respectively. The three neural nets share most +layers except the data layer, loss layer or output layer, etc.. The `Create` +function takes in the full net configuration including layers for training, +validation and test. It removes layers for phases other than the specified +phase based on the `exclude` field in +[layer configuration](layer.html): + + layer { + ... + exclude : kTest # filter this layer for creating test net + } + +The filtered net configuration is passed to the constructor of `NeuralNet`: + + NeuralNet::NeuralNet(NetProto netproto, int npartitions); + +The constructor creates a graph representing the net structure firstly in + + Graph* NeuralNet::CreateGraph(const NetProto& netproto, int npartitions); + +Next, it creates a layer for each node and connects layers if their nodes are +connected. + + void NeuralNet::CreateNetFromGraph(Graph* graph, int npartitions); + +Since the `NeuralNet` instance may be shared among multiple workers, the +`Create` function returns a pointer to the `NeuralNet` instance . + +### Parameter sharing + + `Param` sharing +is enabled by first sharing the Param configuration (in `NeuralNet::Create`) +to create two similar (e.g., the same shape) Param objects, and then calling +(in `NeuralNet::CreateNetFromGraph`), + + void Param::ShareFrom(const Param& from); + +It is also possible to share `Param`s of two nets, e.g., sharing parameters of +the training net and the test net, + + void NeuralNet:ShareParamsFrom(NeuralNet* other); + +It will call `Param::ShareFrom` for each Param object. + +### Access functions +`NeuralNet` provides a couple of access function to get the layers and params +of the net: + + const std::vector<Layer*>& layers() const; + const std::vector<Param*>& params() const ; + Layer* name2layer(string name) const; + Param* paramid2param(int id) const; + + +### Partitioning + + +#### Implementation + +SINGA partitions the neural net in `CreateGraph` function, which creates one +node for each (partitioned) layer. For example, if one layer's partition +dimension is 0 or 1, then it creates `npartition` nodes for it; if the +partition dimension is -1, a single node is created, i.e., no partitioning. +Each node is assigned a partition (or location) ID. If the original layer is +configured with a location ID, then the ID is assigned to each newly created node. +These nodes are connected according to the connections of the original layers. +Some connection layers will be added automatically. +For instance, if two connected sub-layers are located at two +different workers, then a pair of bridge layers is inserted to transfer the +feature (and gradient) blob between them. When two layers are partitioned on +different dimensions, a concatenation layer which concatenates feature rows (or +columns) and a slice layer which slices feature rows (or columns) would be +inserted. These connection layers help making the network communication and +synchronization transparent to the users. + +#### Dispatching partitions to workers + +Each (partitioned) layer is assigned a location ID, based on which it is dispatched to one +worker. Particularly, the pointer to the `NeuralNet` instance is passed +to every worker within the same group, but each worker only computes over the +layers that have the same partition (or location) ID as the worker's ID. When +every worker computes the gradients of the entire model parameters +(strategy-2), we refer to this process as data parallelism. When different +workers compute the gradients of different parameters (strategy-3 or +strategy-1), we call this process model parallelism. The hybrid partitioning +leads to hybrid parallelism where some workers compute the gradients of the +same subset of model parameters while other workers compute on different model +parameters. For example, to implement the hybrid parallelism in for the +[DCNN model](http://arxiv.org/abs/1404.5997), we set `partition_dim = 0` for +lower layers and `partition_dim = 1` for higher layers. + Added: incubator/singa/site/trunk/content/markdown/docs/kr/neuralnet-partition.md URL: http://svn.apache.org/viewvc/incubator/singa/site/trunk/content/markdown/docs/kr/neuralnet-partition.md?rev=1724348&view=auto ============================================================================== --- incubator/singa/site/trunk/content/markdown/docs/kr/neuralnet-partition.md (added) +++ incubator/singa/site/trunk/content/markdown/docs/kr/neuralnet-partition.md Wed Jan 13 03:46:19 2016 @@ -0,0 +1,54 @@ +# Neural Net Partition + +--- + +The purposes of partitioning neural network is to distribute the partitions onto +different working units (e.g., threads or nodes, called workers in this article) +and parallelize the processing. +Another reason for partition is to handle large neural network which cannot be +hold in a single node. For instance, to train models against images with high +resolution we need large neural networks (in terms of training parameters). + +Since *Layer* is the first class citizen in SIGNA, we do the partition against +layers. Specifically, we support partitions at two levels. First, users can configure +the location (i.e., worker ID) of each layer. In this way, users assign one worker +for each layer. Secondly, for one layer, we can partition its neurons or partition +the instances (e.g, images). They are called layer partition and data partition +respectively. We illustrate the two types of partitions using an simple convolutional neural network. + +<img src="../images/conv-mnist.png" style="width: 220px"/> + +The above figure shows a convolutional neural network without any partition. It +has 8 layers in total (one rectangular represents one layer). The first layer is +DataLayer (data) which reads data from local disk files/databases (or HDFS). The second layer +is a MnistLayer which parses the records from MNIST data to get the pixels of a batch +of 8 images (each image is of size 28x28). The LabelLayer (label) parses the records to get the label +of each image in the batch. The ConvolutionalLayer (conv1) transforms the input image to the +shape of 8x27x27. The ReLULayer (relu1) conducts elementwise transformations. The PoolingLayer (pool1) +sub-samples the images. The fc1 layer is fully connected with pool1 layer. It +mulitplies each image with a weight matrix to generate a 10 dimension hidden feature which +is then normalized by a SoftmaxLossLayer to get the prediction. + +<img src="../images/conv-mnist-datap.png" style="width: 1000px"/> + +The above figure shows the convolutional neural network after partitioning all layers +except the DataLayer and ParserLayers, into 3 partitions using data partition. +The read layers process 4 images of the batch, the black and blue layers process 2 images +respectively. Some helper layers, i.e., SliceLayer, ConcateLayer, BridgeSrcLayer, +BridgeDstLayer and SplitLayer, are added automatically by our partition algorithm. +Layers of the same color resident in the same worker. There would be data transferring +across different workers at the boundary layers (i.e., BridgeSrcLayer and BridgeDstLayer), +e.g., between s-slice-mnist-conv1 and d-slice-mnist-conv1. + +<img src="../images/conv-mnist-layerp.png" style="width: 1000px"/> + +The above figure shows the convolutional neural network after partitioning all layers +except the DataLayer and ParserLayers, into 2 partitions using layer partition. We can +see that each layer processes all 8 images from the batch. But different partitions process +different part of one image. For instance, the layer conv1-00 process only 4 channels. The other +4 channels are processed by conv1-01 which residents in another worker. + + +Since the partition is done at the layer level, we can apply different partitions for +different layers to get a hybrid partition for the whole neural network. Moreover, +we can also specify the layer locations to locate different layers to different workers. Added: incubator/singa/site/trunk/content/markdown/docs/kr/overview.md URL: http://svn.apache.org/viewvc/incubator/singa/site/trunk/content/markdown/docs/kr/overview.md?rev=1724348&view=auto ============================================================================== --- incubator/singa/site/trunk/content/markdown/docs/kr/overview.md (added) +++ incubator/singa/site/trunk/content/markdown/docs/kr/overview.md Wed Jan 13 03:46:19 2016 @@ -0,0 +1,67 @@ +# ê°ì + +--- + +SINGAë ëê·ëª¨ ë°ì´í° ë¶ìì ìí ë¥ë¬ë 모ë¸ì í¸ë ì´ëì 목ì ì¼ë¡ í "ë¶ì° ë¥ ë¬ë íë«í¼" ì ëë¤. +모ë¸ì´ ëë ë´ë´ë¤í¸ìí¬ì "ë ì´ì´"ê°ë ì ë°ë¼ ì§ê´ì ì¼ë¡ íë¡ê·¸ëë°ì í ì ìëë¡ ëìì¸ëì´ ììµëë¤. + +* Convolutional Neural Network ì ê°ì í¼ëí¬ìë ë¤í¸ìí¬ì Restricted Boltzmann Machine ê³¼ ê°ì ìëì§ ëª¨ë¸, Recurrent Neural Network ëª¨ë¸ ë± ë¤ìí 모ë¸ì ì§ìí©ëë¤. + +* ë¤ìí "ë ì´ì´"ê° Built-in Layerë¡ ì¤ë¹ëì´ ììµëë¤. + +* SINGA ìí¤í ì²ë synchronous (ë기), asynchronous (ë¹ë기), ê·¸ë¦¬ê³ hybrid (íì´ë¸ë¦¬ë) í¸ë ì´ëì í ì ìëë¡ ì¤ê³ëì´ ììµëë¤. + +* ëí 모ë¸ì í¸ë ì´ëì ë³ë ¬ííë ë¤ìí partition ì¤í´ (ë°°ì¹ ë° í¹ì§ ë¶í )ì ì§ìí©ëë¤. + + +## 목ì + +íì¥ì± : ë¶ì° ìì¤í ì¼ë¡ì¨ ë ë§ì ììì ì´ì©íì¬ í¹ì ì ë°ëì ëë¬ í ëê¹ì§ í¸ë ì´ë ìë를 í¥ììí¨ë¤. + +ì ì©ì± : ëê·ëª¨ ë¶ì° 모ë¸ì í¨ì¨ì ì¸ í¸ë ì´ëì íìí ë°ì´í°ì 모ë¸ì ë¶í , ë¤í¸ìí¬ íµì ë± íë¡ê·¸ë머ì ìì ì ë¨ìííê³ , ë³µì¡í ëª¨ë¸ ë° ìê³ ë¦¬ì¦ì 구ì¶ì ì½ê² íë¤. + + +## ì¤ê³ ì´ë + +íì¥ì±ì ë¶ì° ë¥ë¬ëìì ì¤ìí ì°êµ¬ ê³¼ì ì ëë¤. +SINGAë ë¤ìí í¸ë ì´ë íë ììí¬ì íì¥ì±ì ì ì§í ì ìëë¡ ì¤ê³ëì´ ììµëë¤. +* Synchronous (ë기) : í¸ë ì´ëì 1ë¨ê³ìì ì»ì ììë í¨ê³¼ë¥¼ ëì ëë¤. +* Asynchronous (ë¹ë기) : í¸ë ì´ëì ìë ´ ìë를 í¥ììíµëë¤. +* Hybrid (íì´ë¸ë¦¬ë) : ì½ì¤í¸ ë° ë¦¬ìì¤ (í´ë¬ì¤í° í¬ê¸° ë±)ì ë§ë í¨ê³¼ì ìë ´ ìëì ê· íì ì¡ê³ íì¥ì±ì í¥ììíµëë¤. + +SINGAë ë¥ë¬ë 모ë¸ì ë¤í¸ìí¬ "ë ì´ì´" ê°ë ì ë°ë¼ ì§ê´ì ì¼ë¡ íë¡ê·¸ëë°ì í ì ìëë¡ ëìì¸ëì´ ììµëë¤. ë¤ìí 모ë¸ì ì½ê² 구ì¶íê³ í¸ë ì´ë í ì ììµëë¤. + +## ìì¤í ê°ì + +<img src = "../ images / sgd.png"align = "center"width = "400px"/> +<span> <strong> Figure 1 - SGD íë¦ </strong> </span> + +"ë¥ë¬ë 모ë¸ì íìµíë¤"ë ê²ì í¹ì ìì (ë¶ë¥, ì측 ë±)ì ë¬ì±í기 ìí´ ì¬ì©ëë í¹ì§ë(feature)ì ìì±íë ë³í í¨ìì ìµì ì íë¼ë¯¸í°ë¥¼ ì°¾ë ê²ì ëë¤. +ë³ìì ì¢ê³ ëì¨ì, Cross-Entropy Loss (https://en.wikipedia.org/wiki/Cross_entropy) ë±ì loss function (ìì¤ í¨ì)ìì íì¸í©ëë¤. ì´ í¨ìë ì¼ë°ì ì¼ë¡ ë¹ì í ëë ë¹ ë³¼ë¡ í¨ìì´ë¯ë¡ éè§£ì ì°¾ê¸°ê° ì´ë µìµëë¤. + +ê·¸ëì Stochastic Gradient Descent (íë¥ ì 구배ê°íë²)ì ì´ì©í©ëë¤. +Figure 1ê³¼ ê°ì´ 무ììë¡ ì´ê¸°í ë ë§¤ê° ë³ìì ê°ì ìì¤ í¨ìê° ìì ì§ëë¡ ë°ë³µ ì ë°ì´í¸íê³ ììµëë¤. + +<img src = "../ images / overview.png"align = "center"width = "400px"/> +<span> <strong> Figure 2 - SINGA ê°ì </strong> </span> + +í¸ë ì´ëì íìí ìí¬ë¡ëë workersì serversì ë¶ì°ë©ëë¤. Figure 2ì ê°ì´ 루íë§ë¤ workersë *TrainOneBatch* í¨ì를 í¸ì¶ ë§¤ê° ë³ì 기ì¸ê¸°ë¥¼ ê³ì°í©ëë¤. +*TrainOneBatch* ì ê²½ë§ì êµ¬ì¡°ê° ê¸°ì ë *NeuralNet* ê°ì²´ì ë°ë¼ "ë ì´ì´"를 ì°¨ë¡ë¡ ëë¬ë³´ê³ ììµëë¤. +ê³ì° ë ê²½ì¬ë ë¡ì»¬ ë ¸ëì stubì ë³´ë´ì ¸ ì§ê³ ë í í´ë¹ serversì ë³´ë´ì§ëë¤. Serversë ì ë°ì´í¸ ë ë§¤ê° ë³ì를 workersë¡ ì ì¡ ë¤ì 루í를 ì¤íí©ëë¤. + + +## Job + +SINGAìì "Job"ì ë´ë´ë¤í¸ìí¬ ëª¨ë¸ê³¼ ë°ì´í° í¸ë ì´ë ë°©ë², í´ë¬ì¤í° í í´ë¡ì§ ë±ì´ 기ì ë "Job Configuration"ì ë§í©ëë¤. +Job configurationì Figure 2ì ê·¸ë ¤ì§ ë¤ìì 4 ê°ì§ ìì를 ê°ì§ëë¤. + +  * [NeuralNet (neural-net.html) : ë´ë´ë¤í¸ìí¬ì 구조ì ê° "ë ì´ì´"ì ì¤ì ì 기ì í©ëë¤. +  * [TrainOneBatch (train-one-batch.html) : ëª¨ë¸ ì¹´í ê³ ë¦¬ì ì í©í ìê³ ë¦¬ì¦ì 기ì í©ëë¤. +  * [Updater] (updater.html) : serverìì ë§¤ê° ë³ì를 ì ë°ì´í¸íë ë°©ë²ì 기ì í©ëë¤. +  * [Cluster Topology (distributed-training.html) : workersì servers ë¶ì° í í´ë¡ì§ë¥¼ 기ì í©ëë¤. + +[main í¨ì (programming-guide.html)ì SINGA ëë¼ì´ë²ë¡ ìì ì ì ë¬í©ëë¤. + +ì´ íë¡ì¸ì¤ë Hadoopììì Job ìë¸ë¯¸ì ê³¼ ë¹ì·í©ëë¤. +ì ì ê° main í¨ììì ìì ì¤ì ì í©ëë¤. +Hadoop ì ì ë ìì ì mapperì reducer를 ì¤ì íì§ë§ SINGA ììë ì ì ì "ë ì´ì´"ë Updater ë±ì ì¤ì í©ëë¤. Added: incubator/singa/site/trunk/content/markdown/docs/kr/param.md URL: http://svn.apache.org/viewvc/incubator/singa/site/trunk/content/markdown/docs/kr/param.md?rev=1724348&view=auto ============================================================================== --- incubator/singa/site/trunk/content/markdown/docs/kr/param.md (added) +++ incubator/singa/site/trunk/content/markdown/docs/kr/param.md Wed Jan 13 03:46:19 2016 @@ -0,0 +1,226 @@ +# Parameters + +--- + +A `Param` object in SINGA represents a set of parameters, e.g., a weight matrix +or a bias vector. *Basic user guide* describes how to configure for a `Param` +object, and *Advanced user guide* provides details on implementing users' +parameter initialization methods. + +## Basic user guide + +The configuration of a Param object is inside a layer configuration, as the +`Param` are associated with layers. An example configuration is like + + layer { + ... + param { + name : "p1" + init { + type : kConstant + value: 1 + } + } + } + +The [SGD algorithm](overview.html) starts with initializing all +parameters according to user specified initialization method (the `init` field). +For the above example, +all parameters in `Param` "p1" will be initialized to constant value 1. The +configuration fields of a Param object is defined in [ParamProto](../api/classsinga_1_1ParamProto.html): + + * name, an identifier string. It is an optional field. If not provided, SINGA + will generate one based on layer name and its order in the layer. + * init, field for setting initialization methods. + * share_from, name of another `Param` object, from which this `Param` will share + configurations and values. + * lr_scale, float value to be multiplied with the learning rate when + [updating the parameters](updater.html) + * wd_scale, float value to be multiplied with the weight decay when + [updating the parameters](updater.html) + +There are some other fields that are specific to initialization methods. + +### Initialization methods + +Users can set the `type` of `init` use the following built-in initialization +methods, + + * `kConst`, set all parameters of the Param object to a constant value + + type: kConst + value: float # default is 1 + + * `kGaussian`, initialize the parameters following a Gaussian distribution. + + type: kGaussian + mean: float # mean of the Gaussian distribution, default is 0 + std: float # standard variance, default is 1 + value: float # default 0 + + * `kUniform`, initialize the parameters following an uniform distribution + + type: kUniform + low: float # lower boundary, default is -1 + high: float # upper boundary, default is 1 + value: float # default 0 + + * `kGaussianSqrtFanIn`, initialize `Param` objects with two dimensions (i.e., + matrix) using `kGaussian` and then + multiple each parameter with `1/sqrt(fan_in)`, where`fan_in` is the number of + columns of the matrix. + + * `kUniformSqrtFanIn`, the same as `kGaussianSqrtFanIn` except that the + distribution is an uniform distribution. + + * `kUniformFanInOut`, initialize matrix `Param` objects using `kUniform` and then + multiple each parameter with `sqrt(6/(fan_in + fan_out))`, where`fan_in + + fan_out` sums up the number of columns and rows of the matrix. + +For all above initialization methods except `kConst`, if their `value` is not +1, every parameter will be multiplied with `value`. Users can also implement +their own initialization method following the *Advanced user guide*. + + +## Advanced user guide + +This sections describes the details on implementing new parameter +initialization methods. + +### Base ParamGenerator +All initialization methods are implemented as +subclasses of the base `ParamGenerator` class. + + class ParamGenerator { + public: + virtual void Init(const ParamGenProto&); + void Fill(Param*); + + protected: + ParamGenProto proto_; + }; + +Configurations of the initialization method is in `ParamGenProto`. The `Fill` +function fills the `Param` object (passed in as an argument). + +### New ParamGenerator subclass + +Similar to implement a new Layer subclass, users can define a configuration +protocol message, + + # in user.proto + message FooParamProto { + optional int32 x = 1; + } + extend ParamGenProto { + optional FooParamProto fooparam_conf =101; + } + +The configuration of `Param` would be + + param { + ... + init { + user_type: 'FooParam" # must use user_type for user defined methods + [fooparam_conf] { # must use brackets for configuring user defined messages + x: 10 + } + } + } + +The subclass could be declared as, + + class FooParamGen : public ParamGenerator { + public: + void Fill(Param*) override; + }; + +Users can access the configuration fields in `Fill` by + + int x = proto_.GetExtension(fooparam_conf).x(); + +To use the new initialization method, users need to register it in the +[main function](programming-guide.html). + + driver.RegisterParamGenerator<FooParamGen>("FooParam") # must be consistent with the user_type in configuration + +{% comment %} +### Base Param class + +### Members + + int local_version_; + int slice_start_; + vector<int> slice_offset_, slice_size_; + + shared_ptr<Blob<float>> data_; + Blob<float> grad_; + ParamProto proto_; + +Each Param object has a local version and a global version (inside the data +Blob). These two versions are used for synchronization. If multiple Param +objects share the same values, they would have the same `data_` field. +Consequently, their global version is the same. The global version is updated +by [the stub thread](communication.html). The local version is +updated in `Worker::Update` function which assigns the global version to the +local version. The `Worker::Collect` function is blocked until the global +version is larger than the local version, i.e., when `data_` is updated. In +this way, we synchronize workers sharing parameters. + +In Deep learning models, some Param objects are 100 times larger than others. +To ensure the load-balance among servers, SINGA slices large Param objects. The +slicing information is recorded by `slice_*`. Each slice is assigned a unique +ID starting from 0. `slice_start_` is the ID of the first slice of this Param +object. `slice_offset_[i]` is the offset of the i-th slice in this Param +object. `slice_size_[i]` is the size of the i-th slice. These slice information +is used to create messages for transferring parameter values or gradients to +different servers. + +Each Param object has a `grad_` field for gradients. Param objects do not share +this Blob although they may share `data_`. Because each layer containing a +Param object would contribute gradients. E.g., in RNN, the recurrent layers +share parameters values, and the gradients used for updating are averaged from all recurrent +these recurrent layers. In SINGA, the stub thread will aggregate local +gradients for the same Param object. The server will do a global aggregation +of gradients for the same Param object. + +The `proto_` field has some meta information, e.g., name and ID. It also has a +field called `owner` which is the ID of the Param object that shares parameter +values with others. + +### Functions +The base Param class implements two sets of functions, + + virtual void InitValues(int version = 0); // initialize values according to `init_method` + void ShareFrom(const Param& other); // share `data_` from `other` Param + -------------- + virtual Msg* GenGetMsg(bool copy, int slice_idx); + virtual Msg* GenPutMsg(bool copy, int slice_idx); + ... // other message related functions. + +Besides the functions for processing the parameter values, there is a set of +functions for generating and parsing messages. These messages are for +transferring parameter values or gradients between workers and servers. Each +message corresponds to one Param slice. If `copy` is false, it means the +receiver of this message is in the same process as the sender. In such case, +only pointers to the memory of parameter value (or gradient) are wrapped in +the message; otherwise, the parameter values (or gradients) should be copied +into the message. + + +## Implementing Param subclass +Users can extend the base Param class to implement their own parameter +initialization methods and message transferring protocols. Similar to +implementing a new Layer subclasses, users can create google protocol buffer +messages for configuring the Param subclass. The subclass, denoted as FooParam +should be registered in main.cc, + + dirver.RegisterParam<FooParam>(kFooParam); // kFooParam should be different to 0, which is for the base Param type + + + * type, an integer representing the `Param` type. Currently SINGA provides one + `Param` implementation with type 0 (the default type). If users want + to use their own Param implementation, they should extend the base Param + class and configure this field with `kUserParam` + +{% endcomment %} Added: incubator/singa/site/trunk/content/markdown/docs/kr/programmer-guide.md URL: http://svn.apache.org/viewvc/incubator/singa/site/trunk/content/markdown/docs/kr/programmer-guide.md?rev=1724348&view=auto ============================================================================== --- incubator/singa/site/trunk/content/markdown/docs/kr/programmer-guide.md (added) +++ incubator/singa/site/trunk/content/markdown/docs/kr/programmer-guide.md Wed Jan 13 03:46:19 2016 @@ -0,0 +1,98 @@ +# Programmer Guide + +--- + +To submit a training job, users must provide the configuration of the +four components shown in Figure 1: + + * a [NeuralNet](neural-net.html) describing the neural net structure with the detailed layer setting and their connections; + * a [TrainOneBatch](train-one-batch.html) algorithm which is tailored for different model categories; + * an [Updater](updater.html) defining the protocol for updating parameters at the server side; + * a [Cluster Topology](distributed-training.html) specifying the distributed architecture of workers and servers. + +The *Basic user guide* section describes how to submit a training job using +built-in components; while the *Advanced user guide* section presents details +on writing user's own main function to register components implemented by +themselves. In addition, the training data must be prepared, which has the same +[process](data.html) for both advanced users and basic users. + +<img src="../images/overview.png" align="center" width="400px"/> +<span><strong>Figure 1 - SINGA overview.</strong></span> + + + +## Basic user guide + +Users can use the default main function provided by SINGA to submit the training +job. For this case, a job configuration file written as a google protocol +buffer message for the [JobProto](../api/classsinga_1_1JobProto.html) must be provided in the command line, + + ./bin/singa-run.sh -conf <path to job conf> [-resume] [-test] + +* `-resume` is for continuing the training from last [checkpoint](checkpoint.html). +* `-test` is for testing the performance of previously trained model and extracting features for new data, +more details are available [here](test.html). + +The [MLP](mlp.html) and [CNN](cnn.html) +examples use built-in components. Please read the corresponding pages for their +job configuration files. The subsequent pages will illustrate the details on +each component of the configuration. + +## Advanced user guide + +If a user's model contains some user-defined components, e.g., +[Updater](updater.html), he has to write a main function to +register these components. It is similar to Hadoop's main function. Generally, +the main function should + + * initialize SINGA, e.g., setup logging. + + * register user-defined components. + + * create and pass the job configuration to SINGA driver + +An example main function is like + + #include <string> + #include "singa.h" + #include "user.h" // header for user code + + int main(int argc, char** argv) { + singa::Driver driver; + driver.Init(argc, argv); + bool resume; + // parse resume option from argv. + + // register user defined layers + driver.RegisterLayer<FooLayer, std::string>("kFooLayer"); + // register user defined updater + driver.RegisterUpdater<FooUpdater, std::string>("kFooUpdater"); + ... + auto jobConf = driver.job_conf(); + // update jobConf + + driver.Submit(resume, jobConf); + return 0; + } + +The Driver class' `Init` method will load a job configuration file provided by +users as a command line argument (`-conf <job conf>`). It contains at least the +cluster topology and returns the `jobConf` for users to update or fill in +configurations of neural net, updater, etc. If users define subclasses of +Layer, Updater, Worker and Param, they should register them through the driver. +Finally, the job configuration is submitted to the driver which starts the +training. + +We will provide helper functions to make the configuration easier in the +future, like [keras](https://github.com/fchollet/keras). + +Users need to compile and link their code (e.g., layer implementations and the main +file) with SINGA library (*.libs/libsinga.so*) to generate an +executable file, e.g., with name *mysinga*. To launch the program, users just pass the +path of the *mysinga* and base job configuration to *./bin/singa-run.sh*. + + ./bin/singa-run.sh -conf <path to job conf> -exec <path to mysinga> [other arguments] + +The [RNN application](rnn.html) provides a full example of +implementing the main function for training a specific RNN model. + Added: incubator/singa/site/trunk/content/markdown/docs/kr/programming-guide.md URL: http://svn.apache.org/viewvc/incubator/singa/site/trunk/content/markdown/docs/kr/programming-guide.md?rev=1724348&view=auto ============================================================================== --- incubator/singa/site/trunk/content/markdown/docs/kr/programming-guide.md (added) +++ incubator/singa/site/trunk/content/markdown/docs/kr/programming-guide.md Wed Jan 13 03:46:19 2016 @@ -0,0 +1,95 @@ +# Programming Guide + +--- + +To submit a training job, users must provide the configuration of the +four components shown in Figure 1: + + * a [NeuralNet](neural-net.html) describing the neural net structure with the detailed layer setting and their connections; + * a [TrainOneBatch](train-one-batch.html) algorithm which is tailored for different model categories; + * an [Updater](updater.html) defining the protocol for updating parameters at the server side; + * a [Cluster Topology](distributed-training.html) specifying the distributed architecture of workers and servers. + +The *Basic user guide* section describes how to submit a training job using +built-in components; while the *Advanced user guide* section presents details +on writing user's own main function to register components implemented by +themselves. In addition, the training data must be prepared, which has the same +[process](data.html) for both advanced users and basic users. + +<img src="../images/overview.png" align="center" width="400px"/> +<span><strong>Figure 1 - SINGA overview.</strong></span> + + + +## Basic user guide + +Users can use the default main function provided SINGA to submit the training +job. For this case, a job configuration file written as a google protocol +buffer message for the [JobProto](../api/classsinga_1_1JobProto.html) must be provided in the command line, + + ./bin/singa-run.sh -conf <path to job conf> [-resume] + +`-resume` is for continuing the training from last +[checkpoint](checkpoint.html). +The [MLP](mlp.html) and [CNN](cnn.html) +examples use built-in components. Please read the corresponding pages for their +job configuration files. The subsequent pages will illustrate the details on +each component of the configuration. + +## Advanced user guide + +If a user's model contains some user-defined components, e.g., +[Updater](updater.html), he has to write a main function to +register these components. It is similar to Hadoop's main function. Generally, +the main function should + + * initialize SINGA, e.g., setup logging. + + * register user-defined components. + + * create and pass the job configuration to SINGA driver + + +An example main function is like + + #include "singa.h" + #include "user.h" // header for user code + + int main(int argc, char** argv) { + singa::Driver driver; + driver.Init(argc, argv); + bool resume; + // parse resume option from argv. + + // register user defined layers + driver.RegisterLayer<FooLayer>(kFooLayer); + // register user defined updater + driver.RegisterUpdater<FooUpdater>(kFooUpdater); + ... + auto jobConf = driver.job_conf(); + // update jobConf + + driver.Train(resume, jobConf); + return 0; + } + +The Driver class' `Init` method will load a job configuration file provided by +users as a command line argument (`-conf <job conf>`). It contains at least the +cluster topology and returns the `jobConf` for users to update or fill in +configurations of neural net, updater, etc. If users define subclasses of +Layer, Updater, Worker and Param, they should register them through the driver. +Finally, the job configuration is submitted to the driver which starts the +training. + +We will provide helper functions to make the configuration easier in the +future, like [keras](https://github.com/fchollet/keras). + +Users need to compile and link their code (e.g., layer implementations and the main +file) with SINGA library (*.libs/libsinga.so*) to generate an +executable file, e.g., with name *mysinga*. To launch the program, users just pass the +path of the *mysinga* and base job configuration to *./bin/singa-run.sh*. + + ./bin/singa-run.sh -conf <path to job conf> -exec <path to mysinga> [other arguments] + +The [RNN application](rnn.html) provides a full example of +implementing the main function for training a specific RNN model. Added: incubator/singa/site/trunk/content/markdown/docs/kr/quick-start.md URL: http://svn.apache.org/viewvc/incubator/singa/site/trunk/content/markdown/docs/kr/quick-start.md?rev=1724348&view=auto ============================================================================== --- incubator/singa/site/trunk/content/markdown/docs/kr/quick-start.md (added) +++ incubator/singa/site/trunk/content/markdown/docs/kr/quick-start.md Wed Jan 13 03:46:19 2016 @@ -0,0 +1,177 @@ +# íµ ì¤íí¸ + +--- + +## SINGA ì¸ì¤í¨ + +SINGA ì¸ì¤í¨ì [ì¬ê¸°](installation.html)를 참조íììì¤. + +### Zookeeper ì¤í + +SINGA í¸ë ì´ëì [zookeeper](https://zookeeper.apache.org/)를 ì´ì©í©ëë¤. ì°ì zookeeper ìë¹ì¤ê° ììëì´ ìëì§ íì¸íììì¤. + +ì¤ë¹ë thirdparty ì¤í¬ë¦½í¸ë¥¼ ì¬ì©íì¬ zookeeper를 ì¤ì¹ í ê²½ì° ë¤ì ì¤í¬ë¦½í¸ë¥¼ ì¤ííììì¤. + +    #goto top level folder +    cd SINGA_ROOT +    ./bin/zk-service.sh start + +(`./bin/zk-service.sh stop` // zookeeper ì¤ì§). + +기본 í¬í¸ë¥¼ ì¬ì©íì§ ìê³ zookeeper를 ìììí¬ ë`conf / singa.conf`ì í¸ì§íììì¤. + +    zookeeper_host : "localhost : YOUR_PORT" + +## ë 립í 모ëìì ì¤í + +ë 립í 모ëìì SINGAì ì¤íí ë, [Mesos](http://mesos.apache.org/)ì [YARN](http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop- yarn-site / YARN.html)ê³¼ ê°ì í´ë¬ì¤í° ê´ë¦¬ì ì´ì©íì§ ìë ê²½ì°ë¥¼ ë§í©ëë¤. + +### Single ë ¸ëììì íë ¨ + +íëì íë¡ì¸ì¤ê° ì¶ìë©ëë¤. +ì를 ë¤ì´, +[CIFAR-10](http://www.cs.toronto.edu/~kriz/cifar.html) ë°ì´í° ì¸í¸ë¥¼ ì´ì©íì¬ +[CNN 모ë¸](http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks)ì í¸ë ì´ëìíµëë¤. +íì´í¼ íë¼ë¯¸í°ë [cuda-convnet](https://code.google.com/p/cuda-convnet/)ì ë°ë¼ ì¤ì ëì´ ììµëë¤. +ìì¸í ë´ì©ì [CNN ìí](cnn.html) íì´ì§ë¥¼ 참조íììì¤. + + +#### ë°ì´í°ì ìì ì¤ì + +ë°ì´í° ì¸í¸ ë¤ì´ë¡ëì Triaingì´ë Test를ìí ë°ì´í° ì¤ëì ìì±ì ë¤ìê³¼ ê°ì´ ì¤ìí©ëë¤. + +    cd examples / cifar10 / +    cp Makefile.example Makefile +    make download +    make create + +Trainingê³¼ Test ë°ì´í° ì¸í¸ë ê°ê° * cifar10-train-shard * +ê·¸ë¦¬ê³ * cifar10-test-shard * í´ëì ë§ë¤ì´ì§ëë¤. 모ë ì´ë¯¸ì§ì í¹ì§ íê· ì ë¬ì¬ í * image_mean.bin * íì¼ë ìì±ë©ëë¤. + +CNN ëª¨ë¸ íìµì íìí ìì¤ ì½ëë 모ë SINGAì í¬í¨ëì´ ììµëë¤. ì½ë를 ì¶ê° í íìê° ììµëë¤. +ìì ì¤ì íì¼ (*job.conf*) ì ì§ì íì¬ ì¤í¬ë¦½í¸ (*.. / .. / bin / singa-run.sh*)를 ì¤íí©ëë¤. +SINGA ì½ë를 ë³ê²½íê±°ë ì¶ê° í ë, íë¡ê·¸ëë° ê°ì´ë (programming-guide.html)를 참조íììì¤. + +#### ë³ë ¬í ìì´ í¸ë ì´ë + +Cluster Topologyì 기본ê°ì íëì workerì íëì serverê° ììµëë¤. +ë°ì´í°ì ì ê²½ë§ì ë³ë ¬ ì²ë¦¬ëëì§ ììµëë¤. + +íë ¨ì ììíë ¤ë©´ ë¤ì ì¤í¬ë¦½í¸ë¥¼ ì¤íí©ëë¤. + +    # goto top level folder +    cd ../../ +    ./bin/singa-run.sh -conf examples/cifar10/job.conf + + +íì¬ ì¤íì¤ì¸ ìì ì 목ë¡ì ë³´ë ¤ë©´ + +    ./bin/singa-console.sh list + +    JOB ID | NUM PROCS +    ---------- | ----------- +    24 | 1 + +ìì ì¢ ë£íë ¤ë©´ + +    ./bin/singa-console.sh kill JOB_ID + + +ë¡ê·¸ ë° ìì ì ë³´ * / tmp / singa-log * í´ëì ì ì¥ë©ëë¤. +* conf / singa.conf * íì¼`log-dir`ìì ë³ê²½ ê°ë¥í©ëë¤. + + +#### ë¹ë기 ë³ë ¬ í¸ë ì´ë + +    # job.conf +    ... +    cluster { +      nworker_groups : 2 +      nworkers_per_procs : 2 +      workspace : "examples/cifar10/" +    } + +ì¬ë¬ worker 그룹ì ì¶ìí¨ì¼ë¡ì¨ +In SINGA, ë¹ë기 í¸ë ì´ë (architecture.html)ì ìí í ì ììµëë¤. +ì를 ë¤ì´, *job.conf* ì ìì ê°ì´ ë³ê²½í©ëë¤. +기본ì ì¼ë¡ íëì worker ê·¸ë£¹ì´ íëì worker를 ê°ëë¡ ì¤ì ëì´ ììµëë¤. +ìì ì¤ì ì íëì íë¡ì¸ì¤ì 2 ê°ì workerê° ì¤ì ëì´ ì기 ë문ì 2 ê°ì worker ê·¸ë£¹ì´ ëì¼í íë¡ì¸ì¤ë¡ ì¤íë©ëë¤. +ê²°ê³¼ ë©ëª¨ë¦¬ [Downpour (frameworks.html) í¸ë ì´ë íë ì ìí¬ë¡ ì¤íë©ëë¤. + +ì¬ì©ìë ë°ì´í°ì ë¶ì°ì ì ê²½ ì¸ íìë ììµëë¤. +ëë¤ ì¤íì ì ë°ë¼ ê° worker 그룹ì ë°ì´í°ê° ë³´ë´ì§ëë¤. +ê° workerë ë¤ë¥¸ ë°ì´í° íí°ì ì ë´ë¹í©ëë¤. + +    # job.conf +    ... +    neuralnet { +      layer { +        ... +        sharddata_conf { +          random_skip : 5000 +        } +      } +      ... +    } + +ì¤í¬ë¦½í¸ ì¤í : + +    ./bin/singa-run.sh -conf examples/cifar10/job.conf + +#### ë기í ë³ë ¬ í¸ë ì´ë + +    # job.conf +    ... +    cluster { +      nworkers_per_group : 2 +      nworkers_per_procs : 2 +      workspace : "examples/cifar10/" +    } + +íëì worker 그룹ì¼ë¡ ì¬ë¬ worker를 ì¤ííì¬ ë기 í¸ë ì´ë (architecture.html)ì ìí í ì ììµëë¤. +ì를 ë¤ì´, *job.conf* íì¼ì ìì ê°ì´ ë³ê²½í©ëë¤. +ìì ì¤ì ì íëì worker 그룹ì 2 ê°ì workerê° ì¤ì ëììµëë¤. +worker ì°ë¦¬ë 그룹 ë´ìì ë기íí©ëë¤. +ì´ê²ì ë©ëª¨ë¦¬ [sandblaster (frameworks.html)ë¡ ì¤íë©ëë¤. +모ë¸ì 2 ê°ì workerë¡ ë¶í ë©ëë¤. ê° ë ì´ì´ê° 2 ê°ì workerë¡ ë³´ë ëë¤. +ë°°ë¶ ë ë ì´ì´ë ì본 ë ì´ì´ì 기ë¥ì ê°ì§ë§ í¹ì§ ì¸ì¤í´ì¤ì ìê°`B / g`ë©ëë¤. +ì¬ê¸°ì`B`ë 미ëë°§ì° ì¸ì¤í´ì¤ì ì«ìë¡`g`ë 그룹ì worker ì ììµëë¤. +[ë¤ë¥¸ ì²´ê³ (neural-net.html)를 ì´ì©í ë ì´ì´ (ì ê²½ë§) íí°ì ë°©ë²ë ììµëë¤. + +ë¤ë¥¸ ì¤ì ì 모ë "ë³ë ¬í ìì"ì ê²½ì°ì ëì¼í©ëë¤. + +    ./bin/singa-run.sh -conf examples/cifar10/job.conf + +### í´ë¬ì¤í°ììì íë ¨ + +í´ë¬ì¤í° ì¤ì ì ë³ê²½íì¬ ì í¸ë ì´ë íë ì ìí¬ì íì¥í©ëë¤. + +    nworker_per_procs : 1 + +모ë íë¡ì¸ì¤ë íëì worker ì¤ë ë를 ìì±í©ëë¤. +ê²°ê³¼ worker ì°ë¦¬ë ë¤ë¥¸ íë¡ì¸ì¤ (ë ¸ë)ìì ìì±ë©ëë¤. +í´ë¬ì¤í°ì ë ¸ë를 í¹ì íë ¤ë©´ *SINGA_ROOT/conf/* ì *hostfile* ì ì¤ââì ì´ íìí©ëë¤. + +e.g., + +    logbase-a01 +    logbase-a02 + +zookeeper locationë ì¤ì í´ì¼í©ëë¤. + +e.g., + +    # conf/singa.conf +    zookeeper_host : "logbase-a01" + +ì¤í¬ë¦½í¸ì ì¤íì "Single ë ¸ë í¸ë ì´ë"ê³¼ ëì¼í©ëë¤. + +    ./bin/singa-run.sh -conf examples/cifar10/job.conf + +## Mesosìì ì¤í + +* working * ... + +## ë¤ì + +SINGA ì ì½ë ë³ê²½ ë° ì¶ê°ì ëí ìì¸í ë´ì©ì [íë¡ê·¸ëë° ê°ì´ë](programming-guide.html)를 참조íììì¤. Added: incubator/singa/site/trunk/content/markdown/docs/kr/rbm.md URL: http://svn.apache.org/viewvc/incubator/singa/site/trunk/content/markdown/docs/kr/rbm.md?rev=1724348&view=auto ============================================================================== --- incubator/singa/site/trunk/content/markdown/docs/kr/rbm.md (added) +++ incubator/singa/site/trunk/content/markdown/docs/kr/rbm.md Wed Jan 13 03:46:19 2016 @@ -0,0 +1,365 @@ +# RBM Example + +--- + +This example uses SINGA to train 4 RBM models and one auto-encoder model over the +[MNIST dataset](http://yann.lecun.com/exdb/mnist/). The auto-encoder model is trained +to reduce the dimensionality of the MNIST image feature. The RBM models are trained +to initialize parameters of the auto-encoder model. This example application is +from [Hinton's science paper](http://www.cs.toronto.edu/~hinton/science.pdf). + +## Running instructions + +Running scripts are provided in *SINGA_ROOT/examples/rbm* folder. + +The MNIST dataset has 70,000 handwritten digit images. The +[data preparation](data.html) page +has details on converting this dataset into SINGA recognizable format. Users can +simply run the following commands to download and convert the dataset. + + # at SINGA_ROOT/examples/mnist/ + $ cp Makefile.example Makefile + $ make download + $ make create + +The training is separated into two phases, namely pre-training and fine-tuning. +The pre-training phase trains 4 RBMs in sequence, + + # at SINGA_ROOT/ + $ ./bin/singa-run.sh -conf examples/rbm/rbm1.conf + $ ./bin/singa-run.sh -conf examples/rbm/rbm2.conf + $ ./bin/singa-run.sh -conf examples/rbm/rbm3.conf + $ ./bin/singa-run.sh -conf examples/rbm/rbm4.conf + +The fine-tuning phase trains the auto-encoder by, + + $ ./bin/singa-run.sh -conf examples/rbm/autoencoder.conf + + +## Training details + +### RBM1 + +<img src="../images/example-rbm1.png" align="center" width="200px"/> +<span><strong>Figure 1 - RBM1.</strong></span> + +The neural net structure for training RBM1 is shown in Figure 1. +The data layer and parser layer provides features for training RBM1. +The visible layer (connected with parser layer) of RBM1 accepts the image feature +(784 dimension). The hidden layer is set to have 1000 neurons (units). +These two layers are configured as, + + layer{ + name: "RBMVis" + type: kRBMVis + srclayers:"mnist" + srclayers:"RBMHid" + rbm_conf{ + hdim: 1000 + } + param{ + name: "w1" + init{ + type: kGaussian + mean: 0.0 + std: 0.1 + } + } + param{ + name: "b11" + init{ + type: kConstant + value: 0.0 + } + } + } + + layer{ + name: "RBMHid" + type: kRBMHid + srclayers:"RBMVis" + rbm_conf{ + hdim: 1000 + } + param{ + name: "w1_" + share_from: "w1" + } + param{ + name: "b12" + init{ + type: kConstant + value: 0.0 + } + } + } + + + +For RBM, the weight matrix is shared by the visible and hidden layers. For instance, +`w1` is shared by `vis` and `hid` layers shown in Figure 1. In SINGA, we can configure +the `share_from` field to enable [parameter sharing](param.html) +as shown above for the param `w1` and `w1_`. + +[Contrastive Divergence](train-one-batch.html#contrastive-divergence) +is configured as the algorithm for [TrainOneBatch](train-one-batch.html). +Following Hinton's paper, we configure the [updating protocol](updater.html) +as follows, + + # Updater Configuration + updater{ + type: kSGD + momentum: 0.2 + weight_decay: 0.0002 + learning_rate{ + base_lr: 0.1 + type: kFixed + } + } + +Since the parameters of RBM0 will be used to initialize the auto-encoder, we should +configure the `workspace` field to specify a path for the checkpoint folder. +For example, if we configure it as, + + cluster { + workspace: "examples/rbm/rbm1/" + } + +Then SINGA will [checkpoint the parameters](checkpoint.html) into *examples/rbm/rbm1/*. + +### RBM1 +<img src="../images/example-rbm2.png" align="center" width="200px"/> +<span><strong>Figure 2 - RBM2.</strong></span> + +Figure 2 shows the net structure of training RBM2. +The visible units of RBM2 accept the output from the Sigmoid1 layer. The Inner1 layer +is a `InnerProductLayer` whose parameters are set to the `w1` and `b12` learned +from RBM1. +The neural net configuration is (with layers for data layer and parser layer omitted). + + layer{ + name: "Inner1" + type: kInnerProduct + srclayers:"mnist" + innerproduct_conf{ + num_output: 1000 + } + param{ name: "w1" } + param{ name: "b12"} + } + + layer{ + name: "Sigmoid1" + type: kSigmoid + srclayers:"Inner1" + } + + layer{ + name: "RBMVis" + type: kRBMVis + srclayers:"Sigmoid1" + srclayers:"RBMHid" + rbm_conf{ + hdim: 500 + } + param{ + name: "w2" + ... + } + param{ + name: "b21" + ... + } + } + + layer{ + name: "RBMHid" + type: kRBMHid + srclayers:"RBMVis" + rbm_conf{ + hdim: 500 + } + param{ + name: "w2_" + share_from: "w2" + } + param{ + name: "b22" + ... + } + } + +To load w0 and b02 from RBM0's checkpoint file, we configure the `checkpoint_path` as, + + checkpoint_path: "examples/rbm/rbm1/checkpoint/step6000-worker0" + cluster{ + workspace: "examples/rbm/rbm2" + } + +The workspace is changed for checkpointing `w2`, `b21` and `b22` into +*examples/rbm/rbm2/*. + +### RBM3 + +<img src="../images/example-rbm3.png" align="center" width="200px"/> +<span><strong>Figure 3 - RBM3.</strong></span> + +Figure 3 shows the net structure of training RBM3. In this model, a layer with +250 units is added as the hidden layer of RBM3. The visible units of RBM3 +accepts output from Sigmoid2 layer. Parameters of Inner1 and Innner2 are set to +`w1,b12,w2,b22` which can be load from the checkpoint file of RBM2, +i.e., "examples/rbm/rbm2/". + +### RBM4 + + +<img src="../images/example-rbm4.png" align="center" width="200px"/> +<span><strong>Figure 4 - RBM4.</strong></span> + +Figure 4 shows the net structure of training RBM4. It is similar to Figure 3, +but according to [Hinton's science paper](http://www.cs.toronto.edu/~hinton/science.pdf), the hidden units of the +top RBM (RBM4) have stochastic real-valued states drawn from a unit variance +Gaussian whose mean is determined by the input from the RBM's logistic visible +units. So we add a `gaussian` field in the RBMHid layer to control the +sampling distribution (Gaussian or Bernoulli). In addition, this +RBM has a much smaller learning rate (0.001). The neural net configuration for +the RBM4 and the updating protocol is (with layers for data layer and parser +layer omitted), + + # Updater Configuration + updater{ + type: kSGD + momentum: 0.9 + weight_decay: 0.0002 + learning_rate{ + base_lr: 0.001 + type: kFixed + } + } + + layer{ + name: "RBMVis" + type: kRBMVis + srclayers:"Sigmoid3" + srclayers:"RBMHid" + rbm_conf{ + hdim: 30 + } + param{ + name: "w4" + ... + } + param{ + name: "b41" + ... + } + } + + layer{ + name: "RBMHid" + type: kRBMHid + srclayers:"RBMVis" + rbm_conf{ + hdim: 30 + gaussian: true + } + param{ + name: "w4_" + share_from: "w4" + } + param{ + name: "b42" + ... + } + } + +### Auto-encoder +In the fine-tuning stage, the 4 RBMs are "unfolded" to form encoder and decoder +networks that are initialized using the parameters from the previous 4 RBMs. + +<img src="../images/example-autoencoder.png" align="center" width="500px"/> +<span><strong>Figure 5 - Auto-Encoders.</strong></span> + + +Figure 5 shows the neural net structure for training the auto-encoder. +[Back propagation (kBP)] (train-one-batch.html) is +configured as the algorithm for `TrainOneBatch`. We use the same cluster +configuration as RBM models. For updater, we use [AdaGrad](updater.html#adagradupdater) algorithm with +fixed learning rate. + + ### Updater Configuration + updater{ + type: kAdaGrad + learning_rate{ + base_lr: 0.01 + type: kFixed + } + } + + + +According to [Hinton's science paper](http://www.cs.toronto.edu/~hinton/science.pdf), +we configure a EuclideanLoss layer to compute the reconstruction error. The neural net +configuration is (with some of the middle layers omitted), + + layer{ name: "data" } + layer{ name:"mnist" } + layer{ + name: "Inner1" + param{ name: "w1" } + param{ name: "b12" } + } + layer{ name: "Sigmoid1" } + ... + layer{ + name: "Inner8" + innerproduct_conf{ + num_output: 784 + transpose: true + } + param{ + name: "w8" + share_from: "w1" + } + param{ name: "b11" } + } + layer{ name: "Sigmoid8" } + + # Euclidean Loss Layer Configuration + layer{ + name: "loss" + type:kEuclideanLoss + srclayers:"Sigmoid8" + srclayers:"mnist" + } + +To load pre-trained parameters from the 4 RBMs' checkpoint file we configure `checkpoint_path` as + + ### Checkpoint Configuration + checkpoint_path: "examples/rbm/checkpoint/rbm1/checkpoint/step6000-worker0" + checkpoint_path: "examples/rbm/checkpoint/rbm2/checkpoint/step6000-worker0" + checkpoint_path: "examples/rbm/checkpoint/rbm3/checkpoint/step6000-worker0" + checkpoint_path: "examples/rbm/checkpoint/rbm4/checkpoint/step6000-worker0" + + +## Visualization Results + +<div> +<img src="../images/rbm-weight.PNG" align="center" width="300px"/> + +<img src="../images/rbm-feature.PNG" align="center" width="300px"/> +<br/> +<span><strong>Figure 6 - Bottom RBM weight matrix.</strong></span> + + + + + +<span><strong>Figure 7 - Top layer features.</strong></span> +</div> + +Figure 6 visualizes sample columns of the weight matrix of RBM1, We can see the +Gabor-like filters are learned. Figure 7 depicts the features extracted from +the top-layer of the auto-encoder, wherein one point represents one image. +Different colors represent different digits. We can see that most images are +well clustered according to the ground truth.
