Modified: websites/staging/singa/trunk/content/docs/layer.html ============================================================================== --- websites/staging/singa/trunk/content/docs/layer.html (original) +++ websites/staging/singa/trunk/content/docs/layer.html Wed Sep 2 10:31:57 2015 @@ -1,15 +1,15 @@ <!DOCTYPE html> <!-- - | Generated by Apache Maven Doxia at 2015-08-17 + | Generated by Apache Maven Doxia at 2015-09-02 | Rendered using Apache Maven Fluido Skin 1.4 --> <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"> <head> <meta charset="UTF-8" /> <meta name="viewport" content="width=device-width, initial-scale=1.0" /> - <meta name="Date-Revision-yyyymmdd" content="20150817" /> + <meta name="Date-Revision-yyyymmdd" content="20150902" /> <meta http-equiv="Content-Language" content="en" /> - <title>Apache SINGA – Layers Instruction</title> + <title>Apache SINGA – Layers</title> <link rel="stylesheet" href="../css/apache-maven-fluido-1.4.min.css" /> <link rel="stylesheet" href="../css/site.css" /> <link rel="stylesheet" href="../css/print.css" media="print" /> @@ -189,7 +189,7 @@ Apache SINGA</a> <span class="divider">/</span> </li> - <li class="active ">Layers Instruction</li> + <li class="active ">Layers</li> @@ -423,331 +423,395 @@ <div id="bodyColumn" class="span10" > - <h1>Layers Instruction</h1> + <h1>Layers</h1> +<p>Layer is a core abstraction in SINGA. It performs a variety of feature transformations for extracting high-level features, e.g., loading raw features, parsing RGB values, doing convolution transformation, etc.</p> +<p>The <i>Basic user guide</i> section introduces the configuration of a built-in layer. <i>Advanced user guide</i> explains how to extend the base Layer class to implement users’ functions.</p> +<div class="section"> +<h2><a name="Basic_user_guide"></a>Basic user guide</h2> +<div class="section"> +<h3><a name="Layer_configuration"></a>Layer configuration</h3> +<p>The configurations of three layers from the <a class="externalLink" href="http://singa.incubator.apache.org/docs/mlp">MLP example</a> is shown below,</p> + +<div class="source"> +<div class="source"><pre class="prettyprint">layer { + name: "data" + type: kShardData + sharddata_conf { } + exclude: kTest + partition_dim : 0 +} +layer{ + name: "mnist" + type: kMnist + srclayers: "data" + mnist_conf { } +} +layer{ + name: "fc1" + type: kInnerProduct + srclayers: "mnist" + innerproduct_conf{ } + param{ } + param{ } +} +</pre></div></div> +<p>There are some common fields for all kinds of layers:</p> + +<ul> + +<li><tt>name</tt>: a string used to differentiate two layers.</li> + +<li><tt>type</tt>: an integer used for identifying a Layer subclass. The types of built-in layers are listed in LayerType (defined in job.proto). For user-defined layer subclasses, <tt>user_type</tt> of string should be used instead of <tt>type</tt>. The detail is explained in the <a href="#newlayer">last section</a> of this page.</li> + +<li><tt>srclayers</tt>: one or more layer names, for identifying the source layers. In SINGA, all connections are <a class="externalLink" href="http://singa.incubator.apache.org/docs/neural-net">converted</a> to directed connections.</li> + +<li><tt>exclude</tt>: an enumerate value of type <a href="">Phase</a>, can be {kTest, kValidation, kTrain}. It is used to filter this layer when creating the <a class="externalLink" href="http://singa.incubator.apache.org/docs/neural-net">NeuralNet</a> for the excluding phase. E.g., the “data” layer would be filtered when creating the NeuralNet instance for test phase.</li> + +<li><tt>param</tt>: configuration for a <a class="externalLink" href="http://singa.incubator.apache.org/docs/param">Param</a> instance. There can be multiple Param objects in one layer.</li> + +<li><tt>partition_dim</tt>: integer value indicating the partition dimension of this layer. -1 (the default value) for no partitioning, 0 for partitioning on batch dimension, 1 for partitioning on feature dimension. It is used by <a class="externalLink" href="http://singa.incubator.apache.org/docs/neural-net">CreateGraph</a> for partitioning the neural net.</li> +</ul> +<p>Different layers may have different configurations. These configurations are defined in <tt><type>_conf</tt>. E.g., the “data” layer has <tt>sharddata_conf</tt> and “fc1” layer has <tt>innerproduct_conf</tt>. The subsequent sections explain the functionality of each built-in layer and how to configure it,</p></div> <div class="section"> +<h3><a name="Built-in_Layer_subclasses"></a>Built-in Layer subclasses</h3> +<p>SINGA has provided many built-in layers, which can be used directly to create neural nets. These layers are categorized according to their functionalities,</p> + +<ul> + +<li>Data layers for loading records (e.g., images) from [disk], HDFS or network into memory.</li> + +<li>Parser layers for parsing features, labels, etc. from records, into <a class="externalLink" href="http://singa.incubator.apache.org/api/classsinga_1_1Blob.html">Blob</a>.</li> + +<li>Neuron layers for feature transformation, e.g., <a class="externalLink" href="http://singa.incubator.apache.org/api/classsinga_1_1ConvolutionLayer.html">convolution</a>, <a class="externalLink" href="http://singa.incubator.apache.org/api/classsinga_1_1PoolingLayer.html">pooling</a>, dropout, etc.</li> + +<li>Loss layers for measuring the training objective loss, e.g., [cross entropy-loss] or [Euclidean loss].</li> + +<li>Output layers for outputting the prediction results (e.g., probabilities of each category) onto disk or network.</li> + +<li>Connection layers for connecting layers when the neural net is partitioned.</li> +</ul> +<div class="section"> +<h4><a name="Data_Layers"></a>Data Layers</h4> +<p>Data layers load training/testing data and convert them into <a class="externalLink" href="http://singa.incubator.apache.org/docs/data">Record</a>s, which are parsed by parser layers. The data source can be disk file, HDFS, database or network.</p> <div class="section"> -<h3><a name="ShardData_Layer"></a>ShardData Layer</h3> -<p>ShardData layer is used to read data from disk etc.</p> +<h5><a name="ShardDataLayer"></a>ShardDataLayer</h5> +<p><a class="externalLink" href="http://singa.incubator.apache.org/api/classsinga_1_1ShardDataLayer.html">ShardDataLayer</a> is used to read data from disk file. The file should be created using <a class="externalLink" href="http://singa.incubator.apache.org/api/classsinga_1_1DataShard.html">DataShard</a> class. With the data file prepared, users configure the layer as</p> <div class="source"> -<div class="source"><pre class="prettyprint">layer -{ - name:"data" - type:"kShardData" - data_param - { - path:"Shard_File_Path" - batchsize:int - } - exclude:kTrain|kValidation|kTest|kPositive|kNegative +<div class="source"><pre class="prettyprint">type: kShardData +sharddata_conf { + path: "path to data shard folder" + batchsize: int + random_skip: int } -</pre></div></div></div> +</pre></div></div> +<p><tt>batchsize</tt> specifies the number of records to be trained for one mini-batch. The first <tt>rand() % random_skip</tt> <tt>Record</tt>s will be skipped at the first iteration. This is to enforce that different workers work on different Records.</p></div> <div class="section"> -<h3><a name="Label_Layer"></a>Label Layer</h3> -<p>Label layer is used to extract the label information from training data. The label information will be used in the loss layer to calculate the gradient.</p> +<h5><a name="LMDBDataLayer"></a>LMDBDataLayer</h5> +<p>[LMDBDataLayer] is similar to ShardDataLayer, except that the Records are loaded from LMDB.</p> <div class="source"> -<div class="source"><pre class="prettyprint">layer -{ - name:"label" - type:"kLabel" - srclayers:"data" +<div class="source"><pre class="prettyprint">type: kLMDBData +lmdbdata_conf { + path: "path to LMDB folder" + batchsize: int + random_skip: int +} +</pre></div></div></div></div> +<div class="section"> +<h4><a name="Parser_Layers"></a>Parser Layers</h4> +<p>Parser layers get a vector of Records from data layers and parse features into a Blob.</p> + +<div class="source"> +<div class="source"><pre class="prettyprint">virtual void ParseRecords(Phase phase, const vector<Record>& records, Blob<float>* blob) = 0; +</pre></div></div> +<div class="section"> +<h5><a name="LabelLayer"></a>LabelLayer</h5> +<p><a class="externalLink" href="http://singa.incubator.apache.org/api/classsinga_1_1LabelLayer.html">LabelLayer</a> is used to parse a single label from each Record. Consequently, it will put $b$ (mini-batch size) values into the Blob. It has no specific configuration fields.</p></div> +<div class="section"> +<h5><a name="MnistImageLayer"></a>MnistImageLayer</h5> +<p>[MnistImageLayer] parses the pixel values of each image from the MNIST dataset. The pixel values may be normalized as <tt>x/norm_a - norm_b</tt>. For example, if <tt>norm_a</tt> is set to 255 and <tt>norm_b</tt> is set to 0, then every pixel will be normalized into [0, 1].</p> + +<div class="source"> +<div class="source"><pre class="prettyprint">type: kMnistImage +mnistimage_conf { + norm_a: float + norm_b: float } </pre></div></div></div> <div class="section"> -<h3><a name="Convolution_Layer"></a>Convolution Layer</h3> -<p>Convolution layer is a basic layer used in constitutional neural net. It is used to extract local feature following some local patterns from slide windows in the image.</p> +<h5><a name="RGBImageLayer"></a>RGBImageLayer</h5> +<p><a class="externalLink" href="http://singa.incubator.apache.org/api/classsinga_1_1RGBImageLayer.html">RGBImageLayer</a> parses the RGB values of one image from each Record. It may also apply some transformations, e.g., cropping, mirroring operations. If the <tt>meanfile</tt> is specified, it should point to a path that contains one Record for the mean of each pixel over all training images.</p> <div class="source"> -<div class="source"><pre class="prettyprint">layer -{ - name:"Conv_Number" - type:"kConvolution" - srclayers:"Src_Layer_Name" - convolution_param - { - num_filters:int - //the count of the applied filters - kernel:int - //convolution kernel size - stride:int - //the distance between the successive filters - pad:int - //pad the images with a given int number of pixels border of zeros - } - param - { - name:"weight" - init_method:kGaussian|kConstant:kUniform|kPretrained|kGaussianSqrtFanIn|kUniformSqrtFanIn|kUniformSqrtFanInOut - /*use specific param of each init methods*/ - learning_rate_multiplier:float - } - param - { - name:"bias" - init_method:kConstant|kGaussian|kUniform|kPretrained|kGaussianSqrtFanIn|kUniformSqrtFanIn|kUniformSqrtFanInOut - /**use specific param of each init methods**/ - learning_rate_multiplier:float - } - //kGaussian: sample Gaussian with std and mean - //kUniform: uniform sampling between low and high - //kPretrained: from Toronto Convnet, let a=1/sqrt(fan_in),w*=a after generating from Gaussian distribution - //kGaussianSqrtFanIn: from Toronto Convnet, rectified linear activation, - //let a=sqrt(3)/sqrt(fan_in),range is [-a,+a]. - //no need to set value=sqrt(3),the program will multiply it - //kUniformSqrtFanIn: from Theano MLP tutorial, let a=1/sqrt(fan_in+fan_out). - //for tanh activation, range is [-6a,+6a], for sigmoid activation. - // range is [-24a,+24a],put the scale factor to value field - //For Constant Init, use value:float - //For Gaussian Init, use mean:float, std:float - //For Uniform Init, use low:float, high:float -} -</pre></div></div> -<p>Input:n * c_i * h_i * w_i</p> -<p>Output:n * c_o * h_o * w_o,h_o = (h_i + 2 * pad_h - kernel_h) /stride_h + 1</p></div> -<div class="section"> -<h3><a name="Dropout_Layer"></a>Dropout Layer</h3> -<p>Dropout Layer is a layer that randomly dropout some inputs. This scheme helps deep learning model away from over-fitting.</p></div> -<div class="section"> -<h3><a name="InnerProduct_Layer"></a>InnerProduct Layer</h3> -<p>InnerProduct Layer is a fully connected layer which is the basic element in feed forward neural network. It will use the lower layer as a input vector V and output a vector H by doing the following matrix-vector multiplication:</p> -<p>H = W*V + B // W and B are its weight and bias parameter</p> - -<div class="source"> -<div class="source"><pre class="prettyprint">layer -{ - name:"IP_Number" - type:"kInnerProduct" - srclayers:"Src_Layer_Name" - inner_product_param - { - num_output:int - //The number of the filters - } - param - { - name:"weight" - init_method:kGaussian|kConstant:kUniform|kPretrained|kGaussianSqrtFanIn|kUniformSqrtFanIn|kUniformSqrtFanInOut - std:float - // - learning_rate_multiplier:float - // - weight_decay_multiplier:int - // - /*low:float,high:float*/ - // - } - param - { - name:"bias" - init_method:kConstant|kGaussian|kUniform|kPretrained|kGaussianSqrtFanIn|kUniformSqrtFanIn|kUniformSqrtFanInOut - learning_rate_mulitiplier:float - // - weight_decay_multiplier:int - // - value:int - // - /*low:float,high:float*/ - // - } +<div class="source"><pre class="prettyprint">type: kRGBImage +rgbimage_conf { + scale: float + cropsize: int # cropping each image to keep the central part with this size + mirror: bool # mirror the image by set image[i,j]=image[i,len-j] + meanfile: "Image_Mean_File_Path" } +</pre></div></div> +<p>{% comment %}</p></div></div> +<div class="section"> +<h4><a name="PrefetchLayer"></a>PrefetchLayer</h4> +<p><a class="externalLink" href="http://singa.incubator.apache.org/api/classsinga_1_1PrefetchLayer.html">PrefetchLayer</a> embeds data layers and parser layers to do data prefeching. It will launch a thread to call the data layers and parser layers to load and extract features. It ensures that the I/O task and computation task can work simultaneously. One example PrefetchLayer configuration is,</p> -Input:n * c_i * h_i * w_i -Output:n * c_o * 1 *1 -</pre></div></div></div> +<div class="source"> +<div class="source"><pre class="prettyprint">layer { + name: "prefetch" + type: kPrefetch + sublayers { + name: "data" + type: kShardData + sharddata_conf { } + } + sublayers { + name: "rgb" + type: kRGBImage + srclayers:"data" + rgbimage_conf { } + } + sublayers { + name: "label" + type: kLabel + srclayers: "data" + } + exclude:kTest +} +</pre></div></div> +<p>The layers on top of the PrefetchLayer should use the name of the embedded layers as their source layers. For example, the “rgb” and “label” should be configured to the <tt>srclayers</tt> of other layers.</p> +<p>{% endcomment %}</p></div> +<div class="section"> +<h4><a name="Neuron_Layers"></a>Neuron Layers</h4> +<p>Neuron layers conduct feature transformations.</p> <div class="section"> -<h3><a name="LMDBData_Layer"></a>LMDBData Layer</h3> -<p>This is a data input layer, the data will be provided by the LMDB.</p> +<h5><a name="ConvolutionLayer"></a>ConvolutionLayer</h5> +<p><a class="externalLink" href="http://singa.incubator.apache.org/api/classsinga_1_1ConvolutionLayer.html">ConvolutionLayer</a> conducts convolution transformation.</p> <div class="source"> -<div class="source"><pre class="prettyprint">layer -{ - name:"data" - type:"kLMDBDate" - data_param - { - path:"LMDB_FILE_PATH" - batchsize:int - //batchsize means the quantity of the input disposable - } - exclude:kTrain|kValidation|kTest|kPositive|kNegative +<div class="source"><pre class="prettyprint">type: kConvolution +convolution_conf { + num_filters: int + kernel: int + stride: int + pad: int } -</pre></div></div></div> +param { } # weight/filter matrix +param { } # bias vector +</pre></div></div> +<p>The int value <tt>num_filters</tt> stands for the count of the applied filters; the int value <tt>kernel</tt> stands for the convolution kernel size (equal width and height); the int value <tt>stride</tt> stands for the distance between the successive filters; the int value <tt>pad</tt> pads each with a given int number of pixels border of zeros.</p></div> <div class="section"> -<h3><a name="LRN_Layer"></a>LRN Layer</h3> -<p>Local Response Normalization normalizes over the local input areas. It provides two modes: WITHIN_CHANNEL and ACROSS_CHANNELS. The local response normalization layer performs a kind of “lateral inhibition” by normalizing over local input regions. In ACROSS_CHANNELS mode, the local regions extend across nearby channels, but have no spatial extent (i.e., they have shape local_size x 1 x 1). In WITHIN_CHANNEL mode, the local regions extend spatially, but are in separate channels (i.e., they have shape 1 x local_size x local_size). Each input value is divided by ">http://i.imgur.com/GgTjjtR.png)</a>, where n is the size of each local region, and the sum is taken over the region centered at that value (zero padding is added where necessary).</p> +<h5><a name="InnerProductLayer"></a>InnerProductLayer</h5> +<p><a class="externalLink" href="http://singa.incubator.apache.org/api/classsinga_1_1InnerProductLayer.html">InnerProductLayer</a> is fully connected with its (single) source layer. Typically, it has two parameter fields, one for weight matrix, and the other for bias vector. It rotates the feature of the source layer (by multiplying with weight matrix) and shifts it (by adding the bias vector).</p> <div class="source"> -<div class="source"><pre class="prettyprint">layer -{ - name:"Norm_Number" - type:"kLRN" - lrn_param - { - norm_region:WITHIN_CHANNEL|ACROSS_CHANNELS - local_size:int - //for WITHIN_CHANNEL, it means the side length of the space region which will be summed up - //for ACROSS_CHANNELS, it means the quantity of the adjoining channels which will be summed up - alpha:5e-05 - beta:float - } - srclayers:"Src_Layer_Name" +<div class="source"><pre class="prettyprint">type: kInnerProduct +innerproduct_conf { + num_output: int } +param { } # weight matrix +param { } # bias vector </pre></div></div></div> <div class="section"> -<h3><a name="MnistImage_Layer"></a>MnistImage Layer</h3> -<p>MnistImage is a pre-processing layer for MNIST dataset.</p> +<h5><a name="PoolingLayer"></a>PoolingLayer</h5> +<p><a class="externalLink" href="http://singa.incubator.apache.org/api/classsinga_1_1PoolingLayer.html">PoolingLayer</a> is used to do a normalization (or averaging or sampling) of the feature vectors from the source layer.</p> <div class="source"> -<div class="source"><pre class="prettyprint">layer -{ - name:"mnist" - type:"kMnistImage" - srclayers:"data" - mnist_param - { - sigma:int - alpha:int - gamma:int - kernel:int - elastic_freq:int - beta:int - resize:int - norm_a:int - } +<div class="source"><pre class="prettyprint">type: kPooling +pooling_conf { + pool: AVE|MAX // Choose whether use the Average Pooling or Max Pooling + kernel: int // size of the kernel filter + pad: int // the padding size + stride: int // the step length of the filter } -</pre></div></div></div> +</pre></div></div> +<p>The pooling layer has two methods: Average Pooling and Max Pooling. Use the enum AVE and MAX to choose the method.</p> + +<ul> + +<li>Max Pooling selects the max value for each filtering area as a point of the result feature blob.</li> + +<li>Average Pooling averages all values for each filtering area at a point of the result feature blob.</li> +</ul></div> +<div class="section"> +<h5><a name="ReLULayer"></a>ReLULayer</h5> +<p><a class="externalLink" href="http://singa.incubator.apache.org/api/classsinga_1_1ReLULayer.html">ReLuLayer</a> has rectified linear neurons, which conducts the following transformation, <tt>f(x) = Max(0, x)</tt>. It has no specific configuration fields.</p></div> <div class="section"> -<h3><a name="Pooling_Layer"></a>Pooling Layer</h3> -<p>Max Pooling uses a specific scanning window to find the max value.<br />Average Pooling scans all the values in the window to calculate the average value.</p> +<h5><a name="TanhLayer"></a>TanhLayer</h5> +<p><a class="externalLink" href="http://singa.incubator.apache.org/api/classsinga_1_1TanhLayer.html">TanhLayer</a> uses the tanh as activation function, i.e., <tt>f(x)=tanh(x)</tt>. It has no specific configuration fields.</p></div> +<div class="section"> +<h5><a name="SigmoidLayer"></a>SigmoidLayer</h5> +<p>[SigmoidLayer] uses the sigmoid (or logistic) as activation function, i.e., <tt>f(x)=sigmoid(x)</tt>. It has no specific configuration fields.</p></div> +<div class="section"> +<h5><a name="Dropout_Layer"></a>Dropout Layer</h5> +<p><a class="externalLink" href="http://singa.incubator.apache.org/api/asssinga_1_1DropoutLayer.html">DropoutLayer</a> is a layer that randomly dropouts some inputs. This scheme helps deep learning model away from over-fitting.</p> <div class="source"> -<div class="source"><pre class="prettyprint">layer -{ - name:"Pool_Number" - type:"kPooling" - srclayers:"Src_Layer_Name" - pooling_param - { - pool:AVE|MAX - //Choose whether use the Average Pooling or Max Pooling - kernel:int - //size of the kernel filter - stride:int - //the step length of the filter - } +<div class="source"><pre class="prettyprint">type: kDropout +dropout_conf { + dropout_ratio: float # dropout probability } </pre></div></div></div> <div class="section"> -<h3><a name="ReLU_Layer"></a>ReLU Layer</h3> -<p>The rectifier function is an activation function f(x) = Max(0, x) which can be used by neurons just like any other activation function, a node using the rectifier activation function is called a ReLu node. The main reason that it is used is because of how efficiently it can be computed compared to more conventional activation functions like the sigmoid and hyperbolic tangent, without making a significant difference to generalization accuracy. The rectifier activation function is used instead of a linear activation function to add non linearity to the network, otherwise the network would only ever be able to compute a linear function.</p> +<h5><a name="LRNLayer"></a>LRNLayer</h5> +<p><a class="externalLink" href="http://singa.incubator.apache.org/api/classsinga_1_1LRNLayer.html">LRNLayer</a>, (Local Response Normalization), normalizes over the channels.</p> <div class="source"> -<div class="source"><pre class="prettyprint">layer -{ - name:"Relu_Number" - type:"kReLU" - srclayers:"Src_Layer_Name" +<div class="source"><pre class="prettyprint">type: kLRN +lrn_conf { + local_size: int + alpha: float // scaling parameter + beta: float // exponential number } -</pre></div></div></div> +</pre></div></div> +<p><tt>local_size</tt> specifies the quantity of the adjoining channels which will be summed up. {% comment %} For <tt>WITHIN_CHANNEL</tt>, it means the side length of the space region which will be summed up. {% endcomment %}</p></div></div> +<div class="section"> +<h4><a name="Loss_Layers"></a>Loss Layers</h4> +<p>Loss layers measures the objective training loss.</p> <div class="section"> -<h3><a name="RGBImage_Layer"></a>RGBImage Layer</h3> -<p>RGBImage layer is a pre-processing layer for RGB format images. </p> +<h5><a name="SoftmaxLossLayer"></a>SoftmaxLossLayer</h5> +<p><a class="externalLink" href="http://singa.incubator.apache.org/api/classsinga_1_1SoftmaxLossLayer.html">SoftmaxLossLayer</a> is a combination of the Softmax transformation and Cross-Entropy loss. It applies Softmax firstly to get a prediction probability for each output unit (neuron) and compute the cross-entropy against the ground truth. It is generally used as the final layer to generate labels for classification tasks.</p> <div class="source"> -<div class="source"><pre class="prettyprint">layer -{ - name:"rgb" - type:"kRGBImage" - srclayers:"data" - rgbimage_param - { - meanfile:"Image_Mean_File_Path" - } +<div class="source"><pre class="prettyprint">type: kSoftmaxLoss +softmaxloss_conf { + topk: int +} +</pre></div></div> +<p>The configuration field <tt>topk</tt> is for selecting the labels with <tt>topk</tt> probabilities as the prediction results. It is tedious for users to view the prediction probability of every label.</p></div></div> +<div class="section"> +<h4><a name="Other_Layers"></a>Other Layers</h4> +<div class="section"> +<h5><a name="ConcateLayer"></a>ConcateLayer</h5> +<p><a class="externalLink" href="http://singa.incubator.apache.org/api/classsinga_1_1ConcateLayer.html">ConcateLayer</a> connects more than one source layers to concatenate their feature blob along given dimension.</p> + +<div class="source"> +<div class="source"><pre class="prettyprint">type: kConcate +concate_conf { + concate_dim: int // define the dimension } </pre></div></div></div> <div class="section"> -<h3><a name="Tanh_Layer"></a>Tanh Layer</h3> -<p>Tanh uses the tanh as activation function. It transforms the input into range [-1, 1] using Tanh function. </p> +<h5><a name="SliceLayer"></a>SliceLayer</h5> +<p><a class="externalLink" href="http://singa.incubator.apache.org/api/classsinga_1_1SliceLayer.html">SliceLayer</a> connects to more than one destination layers to slice its feature blob along given dimension.</p> <div class="source"> -<div class="source"><pre class="prettyprint">layer -{ - name:"Tanh_Number" - type:"kTanh" - srclayer:"Src_Layer_Name" +<div class="source"><pre class="prettyprint">type: kSlice +slice_conf { + slice_dim: int } </pre></div></div></div> <div class="section"> -<h3><a name="SoftmaxLoss_Layer"></a>SoftmaxLoss Layer</h3> -<p>Softmax Loss Layer is the implementation of multi-class softmax loss function. It is generally used as the final layer to generate labels for classification tasks.</p> +<h5><a name="SplitLayer"></a>SplitLayer</h5> +<p><a class="externalLink" href="http://singa.incubator.apache.org/api/classsinga_1_1SplitLayer.html">SplitLayer</a> connects to more than one destination layers to replicate its feature blob.</p> <div class="source"> -<div class="source"><pre class="prettyprint">layer -{ - name:"loss" - type:"kSoftmaxLoss" - softmaxloss_param - { - topk:int - } - srclayers:"Src_Layer_Name" - srclayers:"Src_Layer_Name" +<div class="source"><pre class="prettyprint">type: kSplit +split_conf { + num_splits: int } </pre></div></div></div> <div class="section"> -<h3><a name="BridgeSrc__BridgeDst_Layer"></a>BridgeSrc & BridgeDst Layer</h3> -<p>BridgeSrc & BridgeDst Layer are utility layers implementing logics of model partition. It can be used as a lock for synchronization, a transformation storage of different type of model partition and etc.</p></div> +<h5><a name="BridgeSrcLayer__BridgeDstLayer"></a>BridgeSrcLayer & BridgeDstLayer</h5> +<p><a class="externalLink" href="http://singa.incubator.apache.org/api/classsinga_1_1BridgeSrcLayer.html">BridgeSrcLayer</a> & <a class="externalLink" href="http://singa.incubator.apache.org/api/classsinga_1_1BridgeDstLayer.html">BridgeDstLayer</a> are utility layers assisting data (e.g., feature or gradient) transferring due to neural net partitioning. These two layers are added implicitly. Users typically do not need to configure them in their neural net configuration.</p></div></div></div></div> <div class="section"> -<h3><a name="Concate_Layer"></a>Concate Layer</h3> -<p>Concat Layer is used to concatenate the last dimension (namely, num_feature) of the output of two nodes. It is usually used along with fully connected layer.</p></div> +<h2><a name="Advanced_user_guide"></a>Advanced user guide</h2> +<p>The base Layer class is introduced in this section, followed by how to implement a new Layer subclass.</p> <div class="section"> -<h3><a name="Parser_Layer"></a>Parser Layer</h3> -<p>Parser Layer will parse the input records into Blobs. </p></div> +<h3><a name="Base_Layer_class"></a>Base Layer class</h3> <div class="section"> -<h3><a name="Prefetch_Layer"></a>Prefetch Layer</h3> -<p>Prefetch Layer is used to pre-fetch data from disk. It ensures that the I/O task and computation/communication task can work simultaneously. </p> - -<div class="source"> -<div class="source"><pre class="prettyprint">layer -{ - name:"prefetch" - type:"kPrefetch" - sublayers - { - name:"data" - type:"kShardData" - data_param - { - path:"Shard_File_Path" - batchsize:int - } - } - sublayers - { - name:"rgb" - type:"kRGBImage" - srclayers:"data" - rgbimage_param - { - meanfile:"Image_Mean_File_Path" - } - } - sublayers - { - name:"label" - type:"kLabel" - srclayers:"data" - } - exclude:kTrain|kValidation|kTest|kPositive|kNegative +<h4><a name="Members"></a>Members</h4> + +<div class="source"> +<div class="source"><pre class="prettyprint">LayerProto layer_proto_; +Blob<float> data_, grad_; +vector<Layer*> srclayers_, dstlayers_; +</pre></div></div> +<p>The base layer class keeps the user configuration in <tt>layer_proto_</tt>. Source layers and destination layers are stored in <tt>srclayers_</tt> and <tt>dstlayers_</tt>, respectively. Almost all layers has $b$ (mini-batch size) feature vectors, which are stored in the <tt>data_</tt> <a class="externalLink" href="http://singa.incubator.apache.org/api/classsinga_1_1Blob.html">Blob</a> (A Blob is a chunk of memory space, proposed in <a class="externalLink" href="http://caffe.berkeleyvision.org/">Caffe</a>). There are layers without feature vectors; instead, they use other layers’ feature vectors. In this case, the <tt>data_</tt> field is not used. The <tt>grad_</tt> Blob is for storing the gradients of the objective loss w.r.t. the <tt>data_</tt> Blob. It is necessary in <a class="externalLink" href="http://singa.incubator.apache.org/api/classsinga_1_1BPWorker.html">BP algorithm</a>, hence we put it as a member of the base class. For <a class="externalLink" href="http://sin ga.incubator.apache.org/api/classsinga_1_1CDWorker.html">CD algorithm</a>, the <tt>grad_</tt> field is not used; instead, the layer from RBM may have a Blob for the positive phase feature and a Blob for the negative phase feature. For a recurrent layer in RNN, the feature blob contains one vector per internal layer.</p> +<p>If a layer has parameters, these parameters are declared using type <a class="externalLink" href="http://singa.incubator.apache.org/docs/param">Param</a>. Since some layers do not have parameters, we do not declare any <tt>Param</tt> in the base layer class.</p></div> +<div class="section"> +<h4><a name="Functions"></a>Functions</h4> + +<div class="source"> +<div class="source"><pre class="prettyprint">virtual void Setup(const LayerProto& proto, int npartitions = 1); +virtual void ComputeFeature(Phase phase, Metric* perf) = 0; +virtual void ComputeGradient(Phase phase) = 0; +</pre></div></div> +<p>The <tt>Setup</tt> function reads user configuration, i.e. <tt>proto</tt>, and information from source layers, e.g., mini-batch size, to set the shape of the <tt>data_</tt> (and <tt>grad_</tt>) field as well as some other layer specific fields. If <tt>npartitions</tt> is larger than 1, then users need to reduce the sizes of <tt>data_</tt>, <tt>grad_</tt> Blobs or Param objects. For example, if the <tt>partition_dim=0</tt> and there is no source layer, e.g., this layer is a (bottom) data layer, then its <tt>data_</tt> and <tt>grad_</tt> Blob should have <tt>b/npartitions</tt> feature vectors; If the source layer is also partitioned on dimension 0, then this layer should have the same number of feature vectors as the source layer. More complex partition cases are discussed in <a class="externalLink" href="http://singa.incubator.apache.org/docs/neural-net/#neural-net-partitioning">Neural net partitioning</a>. Typically, the Setup function just set the shapes of <tt>data_</tt> Blobs and Param objects. Memory will not be allocated until computation over the data structure happens.</p> +<p>The <tt>ComputeFeature</tt> function evaluates the feature blob by transforming (e.g. convolution and pooling) features from the source layers. <tt>ComputeGradient</tt> computes the gradients of parameters associated with this layer. These two functions are invoked by the <a class="externalLink" href="http://singa.incubator.apache.org/docs/train-one-batch">TrainOneBatch</a> function during training. Hence, they should be consistent with the <tt>TrainOneBatch</tt> function. Particularly, for feed-forward and RNN models, they are trained using <a class="externalLink" href="http://singa.incubator.apache.org/docs/train-one-batch/#back-propagation">BP algorithm</a>, which requires each layer’s <tt>ComputeFeature</tt> function to compute <tt>data_</tt> based on source layers, and requires each layer’s <tt>ComputeGradient</tt> to compute gradients of parameters and source layers’ <tt>grad_</tt>. For energy models, e.g., RBM, they are trained by <a class="externalLin k" href="http://singa.incubator.apache.org/docs/train-one-batch/#contrastive-divergence">CD algorithm</a>, which requires each layer’s <tt>ComputeFeature</tt> function to compute the feature vectors for the positive phase or negative phase depending on the <tt>phase</tt> argument, and requires the <tt>ComputeGradient</tt> function to only compute parameter gradients. For some layers, e.g., loss layer or output layer, they can put the loss or prediction result into the <tt>metric</tt> argument, which will be averaged and displayed periodically.</p></div></div> +<div class="section"> +<h3><a name="Implementing_a_new_Layer_subclass"></a>Implementing a new Layer subclass</h3> +<p>Users can extend the base layer class to implement their own feature transformation logics as long as the two virtual functions are overridden to be consistent with the <tt>TrainOneBatch</tt> function. The <tt>Setup</tt> function may also be overridden to read specific layer configuration.</p> +<div class="section"> +<h4><a name="Layer_specific_protocol_message"></a>Layer specific protocol message</h4> +<p>To implement a new layer, the first step is to define the layer specific configuration. Suppose the new layer is <tt>FooLayer</tt>, the layer specific google protocol message <tt>FooLayerProto</tt> should be defined as</p> + +<div class="source"> +<div class="source"><pre class="prettyprint"># in user.proto +package singa +import "job.proto" +message FooLayerProto { + optional int32 a = 1; // specific fields to the FooLayer +} +</pre></div></div> +<p>In addition, users need to extend the original <tt>LayerProto</tt> (defined in job.proto of SINGA) to include the <tt>foo_conf</tt> as follows.</p> + +<div class="source"> +<div class="source"><pre class="prettyprint">extend LayerProto { + optional FooLayerProto foo_conf = 101; // unique field id, reserved for extensions +} +</pre></div></div> +<p>If there are multiple new layers, then each layer that has specific configurations would have a <tt><type>_conf</tt> field and takes one unique extension number. SINGA has reserved enough extension numbers, e.g., starting from 101 to 1000.</p> + +<div class="source"> +<div class="source"><pre class="prettyprint"># job.proto of SINGA +LayerProto { + ... + extensions 101 to 1000; +} +</pre></div></div> +<p>With user.proto defined, users can use <a class="externalLink" href="https://developers.google.com/protocol-buffers/">protoc</a> to generate the <tt>user.pb.cc</tt> and <tt>user.pb.h</tt> files. In users’ code, the extension fields can be accessed via,</p> + +<div class="source"> +<div class="source"><pre class="prettyprint">auto conf = layer_proto_.GetExtension(foo_conf); +int a = conf.a(); +</pre></div></div> +<p>When defining configurations of the new layer (in job.conf), users should use <tt>user_type</tt> for its layer type instead of <tt>type</tt>. In addition, <tt>foo_conf</tt> should be enclosed in brackets.</p> + +<div class="source"> +<div class="source"><pre class="prettyprint">layer { + name: "foo" + user_type: "kFooLayer" # Note user_type of user-defined layers is string + [singa.foo_conf] { # Note there is a pair of [] for extension fields + a: 10 + } } </pre></div></div></div> <div class="section"> -<h3><a name="Slice_Layer"></a>Slice Layer</h3> -<p>The Slice layer is a utility layer that slices an input layer to multiple output layers along a given dimension (currently num or channel only) with given slice indices.</p></div> +<h4><a name="New_Layer_subclass_declaration"></a>New Layer subclass declaration</h4> +<p>The new layer subclass can be implemented like the built-in layer subclasses.</p> + +<div class="source"> +<div class="source"><pre class="prettyprint">class FooLayer : public Layer { + public: + void Setup(const LayerProto& proto, int npartitions = 1) override; + void ComputeFeature(Phase phase, Metric* perf) override; + void ComputeGradient(Phase phase) override; + + private: + // members +}; +</pre></div></div> +<p>Users must override the two virtual functions to be called by the <tt>TrainOneBatch</tt> for either BP or CD algorithm. Typically, the <tt>Setup</tt> function will also be overridden to initialize some members. The user configured fields can be accessed through <tt>layer_proto_</tt> as shown in the above paragraphs.</p></div> <div class="section"> -<h3><a name="Split_Layer"></a>Split Layer</h3> -<p>The Split Layer can seperate the input blob into several output blobs. It is used to the situation which one input blob should be input to several other output blobs.</p></div></div> +<h4><a name="New_Layer_subclass_registration"></a>New Layer subclass registration</h4> +<p>The newly defined layer should be registered in <a class="externalLink" href="http://singa.incubator.apache.org/docs/programming-guide">main.cc</a> by adding</p> + +<div class="source"> +<div class="source"><pre class="prettyprint">driver.RegisterLayer<FooLayer>("kFooLayer"); // "kFooLayer" should be matched to layer configurations in job.conf. +</pre></div></div> +<p>After that, the <a class="externalLink" href="http://singa.incubator.apache.org/docs/neural-net">NeuralNet</a> can create instances of the new Layer subclass.</p></div></div></div> </div> </div> </div>
Modified: websites/staging/singa/trunk/content/docs/lmdb.html ============================================================================== --- websites/staging/singa/trunk/content/docs/lmdb.html (original) +++ websites/staging/singa/trunk/content/docs/lmdb.html Wed Sep 2 10:31:57 2015 @@ -1,13 +1,13 @@ <!DOCTYPE html> <!-- - | Generated by Apache Maven Doxia at 2015-08-17 + | Generated by Apache Maven Doxia at 2015-09-02 | Rendered using Apache Maven Fluido Skin 1.4 --> <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"> <head> <meta charset="UTF-8" /> <meta name="viewport" content="width=device-width, initial-scale=1.0" /> - <meta name="Date-Revision-yyyymmdd" content="20150817" /> + <meta name="Date-Revision-yyyymmdd" content="20150902" /> <meta http-equiv="Content-Language" content="en" /> <title>Apache SINGA – </title> <link rel="stylesheet" href="../css/apache-maven-fluido-1.4.min.css" /> Modified: websites/staging/singa/trunk/content/docs/mlp.html ============================================================================== --- websites/staging/singa/trunk/content/docs/mlp.html (original) +++ websites/staging/singa/trunk/content/docs/mlp.html Wed Sep 2 10:31:57 2015 @@ -1,13 +1,13 @@ <!DOCTYPE html> <!-- - | Generated by Apache Maven Doxia at 2015-08-17 + | Generated by Apache Maven Doxia at 2015-09-02 | Rendered using Apache Maven Fluido Skin 1.4 --> <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"> <head> <meta charset="UTF-8" /> <meta name="viewport" content="width=device-width, initial-scale=1.0" /> - <meta name="Date-Revision-yyyymmdd" content="20150817" /> + <meta name="Date-Revision-yyyymmdd" content="20150902" /> <meta http-equiv="Content-Language" content="en" /> <title>Apache SINGA – </title> <link rel="stylesheet" href="../css/apache-maven-fluido-1.4.min.css" /> @@ -21,7 +21,7 @@ <script type="text/javascript" src="../js/apache-maven-fluido-1.4.min.js"></script> - <meta name="Notice" content="Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file to you under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at . http://www.apache.org/licenses/LICENSE-2.0 . Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License." /> </hea d> + </head> <body class="topBarEnabled"> @@ -425,39 +425,28 @@ <div id="bodyColumn" class="span10" > - -<p>This example will show you how to use SINGA to train a MLP model using mnist dataset.</p> -<div class="section"> -<div class="section"> -<h3><a name="Prepare_for_the_data"></a>Prepare for the data</h3> - -<ul> - -<li>First go to the <tt>example/mnist/</tt> folder for preparing the dataset. There should be a makefile example called Makefile.example in the folder. Run the command <tt>cp Makefile.example Makefile</tt> to generate the makefile. Then run the command <tt>make download</tt> and <tt>make create</tt> in the current folder to download mnist dataset and prepare for the training and testing datashard.</li> -</ul></div> + <p># MLP Example</p> +<p>Multilayer perceptron (MLP) is a feed-forward artificial neural network model. A MLP typically consists of multiple directly connected layers, with each layer fully connected to the next one. In this example, we will use SINGA to train a <a class="externalLink" href="http://arxiv.org/abs/1003.0358">simple MLP model proposed by Ciresan</a> for classifying handwritten digits from the <a class="externalLink" href="http://yann.lecun.com/exdb/mnist/">MNIST dataset</a>.</p> <div class="section"> -<h3><a name="Set_model_and_cluster_configuration."></a>Set model and cluster configuration.</h3> +<h2><a name="Running_instructions"></a>Running instructions</h2> +<p>Please refer to the <a class="externalLink" href="http://singa.incubator.apache.org/docs/installation">installation</a> page for instructions on building SINGA, and the <a class="externalLink" href="http://singa.incubator.apache.org/docs/quick-start">quick start</a> for instructions on starting zookeeper.</p> +<p>We have provided scripts for preparing the training and test dataset in <i>examples/cifar10/</i>.</p> -<ul> - -<li>If you just want to use the training model provided in this example, you can just use job.conf file in current directory. Fig. 1 gives an example of MLP struture. In this example, we define a neurualnet that contains 5 hidden layer. fc+tanh is the hidden layer(fc is for the inner product part, and tanh is for the non-linear activation function), and the final softmax layer is represented as fc+loss (inner product and softmax). For each layer, we define its name, input layer(s), basic configurations (e.g. number of nodes, parameter initialization settings). If you want to learn more about how it is configured, you can go to <a class="externalLink" href="http://singa.incubator.apache.org/docs/model-config.html">Model Configuration</a> to get details.</li> -</ul> +<div class="source"> +<div class="source"><pre class="prettyprint"># in examples/mnist +$ cp Makefile.example Makefile +$ make download +$ make create +</pre></div></div> +<p>After the datasets are prepared, we start the training by</p> -<div style="text-align: center"> -<img src="../images/mlp_example.png" style="width: 280px" alt="" /> <br />Fig. 1: MLP example </img> -</div></div> -<div class="section"> -<h3><a name="Run_SINGA"></a>Run SINGA</h3> +<div class="source"> +<div class="source"><pre class="prettyprint">./bin/singa-run.sh -conf examples/mnist/job.conf +</pre></div></div> +<p>After it is started, you should see output like</p> -<ul> - -<li> -<p>All script of SINGA should be run in the root folder of SINGA. First you need to start the zookeeper service if zookeeper is not started. The command is <tt>./bin/zk-service start</tt>. Then you can run the command <tt>./bin/singa-run.sh -conf examples/mnist/job.conf</tt> to start a SINGA job using examples/mnist/job.conf as the job configuration. After it is started, you should get a screenshots like the following:</p> - <div class="source"> -<div class="source"><pre class="prettyprint">xxx@yyy:zzz/incubator-singa$ ./bin/singa-run.sh -conf examples/mnist/job.conf -Unique JOB_ID is 1 -Record job information to /tmp/singa-log/job-info/job-1-20150817-055231 +<div class="source"><pre class="prettyprint">Record job information to /tmp/singa-log/job-info/job-1-20150817-055231 Executing : ./singa -conf /xxx/incubator-singa/examples/mnist/job.conf -singa_conf /xxx/incubator-singa/conf/singa.conf -singa_job 1 E0817 07:15:09.211885 34073 cluster.cc:51] proc #0 -> 192.168.5.128:49152 (pid = 34073) E0817 07:15:14.972231 34114 server.cc:36] Server (group = 0, id = 0) start @@ -477,16 +466,171 @@ E0817 07:18:52.608111 34073 trainer.cc:3 E0817 07:19:12.168465 34073 trainer.cc:373] Train step-100, loss : 1.387759, accuracy : 0.721000 E0817 07:19:31.855865 34073 trainer.cc:373] Train step-110, loss : 1.335246, accuracy : 0.736500 E0817 07:19:57.327133 34073 trainer.cc:373] Test step-120, loss : 1.216652, accuracy : 0.769900 +</pre></div></div> +<p>After the training of some steps (depends on the setting) or the job is finished, SINGA will <a class="externalLink" href="http://singa.incubator.apache.org/docs/checkpoint">checkpoint</a> the model parameters.</p></div> +<div class="section"> +<h2><a name="Details"></a>Details</h2> +<p>To train a model in SINGA, you need to prepare the datasets, and a job configuration which specifies the neural net structure, training algorithm (BP or CD), SGD update algorithm (e.g. Adagrad), number of training/test steps, etc.</p> +<div class="section"> +<h3><a name="Data_preparation"></a>Data preparation</h3> +<p>Before using SINGA, you need to write a program to pre-process the dataset you use to a format that SINGA can read. Please refer to the <a class="externalLink" href="http://singa.incubator.apache.org/docs/data#example---mnist-dataset">Data Preparation</a> to get details about preparing this MNIST dataset.</p></div> +<div class="section"> +<h3><a name="Neural_net"></a>Neural net</h3> + +<div style="text-align: center"> +<img src="http://singa.incubator.apache.org/assets/image/mlp-example.png" style="width: 230px" alt="" /> +<br /><b>Figure 1 - Net structure of the MLP example. </b></img> +</div> +<p>Figure 1 shows the structure of the simple MLP model, which is constructed following <a class="externalLink" href="http://arxiv.org/abs/1003.0358">Ciresan’s paper</a>. The dashed circle contains two layers which represent one feature transformation stage. There are 6 such stages in total. They sizes of the <a class="externalLink" href="http://singa.incubator.apache.org/docs/layer#innerproductlayer">InnerProductLayer</a>s in these circles decrease from 2500->2000->1500->1000->500->10.</p> +<p>Next we follow the guide in <a class="externalLink" href="http://singa.incubator.apache.org/docs/neural-net">neural net page</a> and <a class="externalLink" href="http://singa.incubator.apache.org/docs/layer">layer page</a> to write the neural net configuration.</p> + +<ul> + +<li> +<p>We configure a <a class="externalLink" href="http://singa.incubator.apache.org/docs/layer#data-layers">data layer</a> to read the training/testing <tt>Records</tt> from <tt>DataShard</tt>.</p> + +<div class="source"> +<div class="source"><pre class="prettyprint">layer { + name: "data" + type: kShardData + sharddata_conf { + path: "examples/mnist/mnist_train_shard" + batchsize: 1000 + } + exclude: kTest + } + +layer { + name: "data" + type: kShardData + sharddata_conf { + path: "examples/mnist/mnist_test_shard" + batchsize: 1000 + } + exclude: kTrain + } +</pre></div></div></li> + +<li> +<p>We configure two <a class="externalLink" href="http://singa.incubator.apache.org/docs/layer#parser-layers">parser layers</a> to extract the image feature and label from <tt>Records</tt>s loaded by the <i>data</i> layer. The <a class="externalLink" href="http://singa.incubator.apache.org/docs/layer#mnistlayer">MnistLayer</a> will normalize the pixel values into [-1,1].</p> + +<div class="source"> +<div class="source"><pre class="prettyprint">layer{ + name:"mnist" + type: kMnist + srclayers: "data" + mnist_conf { + norm_a: 127.5 + norm_b: 1 + } + } + +layer{ + name: "label" + type: kLabel + srclayers: "data" + } +</pre></div></div></li> + +<li> +<p>All <a class="externalLink" href="http://singa.incubator.apache.org/docs/layer#innerproductlayer">InnerProductLayer</a>s are configured similarly as,</p> + +<div class="source"> +<div class="source"><pre class="prettyprint">layer{ + name: "fc1" + type: kInnerProduct + srclayers:"mnist" + innerproduct_conf{ + num_output: 2500 + } + param{ + name: "w1" + init { + type: kUniform + low:-0.05 + high:0.05 + } + } + param{ + name: "b1" + init { + type : kUniform + low: -0.05 + high:0.05 + } + } +} +</pre></div></div> +<p>with the <tt>num_output</tt> decreasing from 2500 to 10.</p></li> + +<li> +<p>All <a class="externalLink" href="http://singa.incubator.apache.org/docs/layer#tanhlayer">TanhLayer</a> are configured similarly as,</p> + +<div class="source"> +<div class="source"><pre class="prettyprint">layer{ + name: "tanh1" + type: kTanh + tanh_conf { + outer_scale: 1.7159047 + inner_scale: 0.6666667 + } + srclayers:"fc1" +} </pre></div></div></li> </ul> -<p>After the training of some steps (depends on the setting) or the job is finished, SINGA will checkpoint the current parameter. In the next time, you can train (or use for your application) by loading the checkpoint. Please refer to <a class="externalLink" href="http://singa.incubator.apache.org/docs/checkpoint.html">Checkpoint</a> for the use of checkpoint.</p></div> -<div class="section"> -<h3><a name="Build_your_own_model"></a>Build your own model</h3> +<p>every neuron from the source layer is transformed as <tt>outer_scale*tanh(inner_scale* x)</tt>.</p> <ul> -<li>If you want to specify you own model, then you need to decribe it in the job.conf file. It should contain the neurualnet structure, training algorithm(backforward or contrastive divergence etc.), SGD update algorithm(e.g. Adagrad), number of training/test steps and training/test frequency, and display features and etc. SINGA will read job.conf as a Google protobuf class <a href="../src/proto/job.proto">JobProto</a>. You can also refer to the <a class="externalLink" href="http://singa.incubator.apache.org/docs/programmer-guide.html">Programmer Guide</a> to get details.</li> -</ul></div></div> +<li> +<p>The final <a class="externalLink" href="http://singa.incubator.apache.org/docs/layer#softmaxloss">Softmax loss layer</a> connects to LabelLayer and the last TanhLayer.</p> + +<div class="source"> +<div class="source"><pre class="prettyprint">layer{ + name: "loss" + type:kSoftmaxLoss + softmaxloss_conf{ + topk:1 + } + srclayers:"fc6" + srclayers:"label" +} +</pre></div></div></li> +</ul></div> +<div class="section"> +<h3><a name="Updater"></a>Updater</h3> +<p>The <a class="externalLink" href="http://singa.incubator.apache.org/docs/updater#updater">normal SGD updater</a> is selected. The learning rate shrinks by 0.997 every 60 steps (i.e., one epoch).</p> + +<div class="source"> +<div class="source"><pre class="prettyprint">updater{ + type: kSGD + learning_rate{ + base_lr: 0.001 + type : kStep + step_conf{ + change_freq: 60 + gamma: 0.997 + } + } +} +</pre></div></div></div> +<div class="section"> +<h3><a name="TrainOneBatch_algorithm"></a>TrainOneBatch algorithm</h3> +<p>The MLP model is a feed-forward model, hence [Back-propagation algorithm]({{ BASE_PATH}}/docs/train-one-batch#back-propagation) is selected.</p> + +<div class="source"> +<div class="source"><pre class="prettyprint"> alg: kBP +</pre></div></div></div> +<div class="section"> +<h3><a name="Cluster_setting"></a>Cluster setting</h3> +<p>The following configuration set a single worker and server for training. <a class="externalLink" href="http://singa.incubator.apache.org/docs/frameworks">Training frameworks</a> page introduces configurations of a couple of distributed training frameworks.</p> + +<div class="source"> +<div class="source"><pre class="prettyprint">cluster { + nworker_groups: 1 + nserver_groups: 1 +} +</pre></div></div></div></div> </div> </div> </div> Modified: websites/staging/singa/trunk/content/docs/model-config.html ============================================================================== --- websites/staging/singa/trunk/content/docs/model-config.html (original) +++ websites/staging/singa/trunk/content/docs/model-config.html Wed Sep 2 10:31:57 2015 @@ -1,13 +1,13 @@ <!DOCTYPE html> <!-- - | Generated by Apache Maven Doxia at 2015-08-17 + | Generated by Apache Maven Doxia at 2015-09-02 | Rendered using Apache Maven Fluido Skin 1.4 --> <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"> <head> <meta charset="UTF-8" /> <meta name="viewport" content="width=device-width, initial-scale=1.0" /> - <meta name="Date-Revision-yyyymmdd" content="20150817" /> + <meta name="Date-Revision-yyyymmdd" content="20150902" /> <meta http-equiv="Content-Language" content="en" /> <title>Apache SINGA – Model Configuration</title> <link rel="stylesheet" href="../css/apache-maven-fluido-1.4.min.css" /> @@ -423,10 +423,10 @@ <div id="bodyColumn" class="span10" > - <div class="section"> -<h2><a name="Model_Configuration"></a>Model Configuration</h2> + <h1>Model Configuration</h1> <p>SINGA uses the stochastic gradient descent (SGD) algorithm to train parameters of deep learning models. For each SGD iteration, there is a <a href="docs/architecture.html">Worker</a> computing gradients of parameters from the NeuralNet and a <a href="">Updater</a> updating parameter values based on gradients. Hence the model configuration mainly consists these three parts. We will introduce the NeuralNet, Worker and Updater in the following paragraphs and describe the configurations for them. All model configuration is specified in the model.conf file in the user provided workspace folder. E.g., the <a class="externalLink" href="https://github.com/apache/incubator-singa/tree/master/examples/cifar10">cifar10 example folder</a> has a model.conf file.</p> <div class="section"> +<div class="section"> <h3><a name="NeuralNet"></a>NeuralNet</h3> <div class="section"> <h4><a name="Uniform_model_neuralnet_representation"></a>Uniform model (neuralnet) representation</h4>
