Repository: incubator-singa
Updated Branches:
  refs/heads/master 99bae0209 -> 3d688be4e


SINGA-395 Add documentation for autograd APIs

updated the doc page for autograd
1. fix some typos
2. change the xception net example to two simple examples


Project: http://git-wip-us.apache.org/repos/asf/incubator-singa/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-singa/commit/8143cf40
Tree: http://git-wip-us.apache.org/repos/asf/incubator-singa/tree/8143cf40
Diff: http://git-wip-us.apache.org/repos/asf/incubator-singa/diff/8143cf40

Branch: refs/heads/master
Commit: 8143cf4051ee5f141d875cba5a974ca417bc5848
Parents: 4a1b1e2
Author: zmeihui <[email protected]>
Authored: Sun Nov 18 14:56:35 2018 +0800
Committer: zmeihui <[email protected]>
Committed: Sun Nov 18 14:56:35 2018 +0800

----------------------------------------------------------------------
 doc/en/docs/autograd.md     | 146 ++++++++++++++++++++++++
 doc/en/docs/autograd_doc.md | 241 ---------------------------------------
 2 files changed, 146 insertions(+), 241 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-singa/blob/8143cf40/doc/en/docs/autograd.md
----------------------------------------------------------------------
diff --git a/doc/en/docs/autograd.md b/doc/en/docs/autograd.md
new file mode 100644
index 0000000..6070629
--- /dev/null
+++ b/doc/en/docs/autograd.md
@@ -0,0 +1,146 @@
+# Autograd in Singa
+
+There are two typical ways to implement autograd, via symbolic differentiation 
like [Theano](http://deeplearning.net/software/theano/index.html) or reverse 
differentiation like 
[Pytorch](https://pytorch.org/docs/stable/notes/autograd.html). Singa follows 
Pytorch way, which records the computation graph and apply the backward 
propagation automatically after forward propagation. The autograd algorithm is 
explained in details 
[here](https://pytorch.org/docs/stable/notes/autograd.html). We explain the 
relevant modules in Singa and give an example to illustrate the usage. 
+
+## Relevant Modules
+
+There are three classes involved in autograd, namely  `singa.tensor.Tensor` , 
`singa.autograd.Operation`, and `singa.autograd.Layer`. In the rest of this 
article, we use tensor, operation and layer to refer to an instance of the 
respective class.
+
+### Tensor
+
+Three attributes of Tensor are used by autograd, 
+-  `.creator` is an `Operation` instance. It records the operation that 
generates the Tensor instance.
+-  `.requires_grad` is a boolean variable. It is used to indicate that the 
autograd algorithm needs to compute the gradient of the tensor (i.e., the 
owner). For example, during backpropagation, the gradients of the tensors for 
the weight matrix of a linear layer and the feature maps of a convolution layer 
(not the bottom layer) should be computed.
+-  `.stores_grad` is a boolean variable. It is used to indicate that the 
gradient of the owner tensor should be stored and output by the backward 
function. For example, the gradient of the feature maps is computed during 
backpropagation, but is not included in the output of the backward function. 
+
+Programmers can change `requires_grad` and `stores_grad` of a Tensor instance. 
For example, if later is set to True, the corresponding gradient is included in 
the output of the backward function. It should be noted that if `stores_grad` 
is True, then `requires_grad` must be true, not vice versa.
+
+
+### Operation
+
+It takes one or more `Tensor` instances as input, and then outputs one or more 
`Tensor` instances. For example, ReLU can be implemented as a specific 
Operation subclass. When an `Operation` instance is called (after 
instantiation), the following two steps are executed:
+
+1. record the source operations, i.e., the `creator`s of the input tensors.    
2. do calculation by calling member function `.forward()`
+
+There are two member functions for forwarding and backwarding, i.e., 
`.forward()` and `.backward()`. They take `Tensor.data` as inputs (the type is 
`CTensor`), and output `Ctensor`s. To add a specific operation, subclass 
`operation` should implement their own `.forward()` and `.backward()`. The 
`backward()` function is called by the `backward()` function of autograd 
automatically during backward propogation to compute the gradients of inputs 
(according to the `require_grad` field). 
+
+### Layer
+
+For those operations that require parameters, we package them into a new 
class, `Layer`. For example, convolution operation is wrapped into a 
convolution layer. `Layer` manages (stores) the parameters and calls the 
corresponding `Operation`s to implement the transformation.
+
+
+
+## Examples
+
+Multiple examples are provided in the [example 
folder](https://github.com/apache/incubator-singa/tree/master/examples/autograd).
 We explain two representative examples here.
+
+### Operation only
+
+The following codes implement a MLP model using only Operation instances (no 
Layer instances).
+
+#### Import packages
+
+```
+from singa.tensor import Tensor
+from singa import autograd
+from singa import opt
+```
+
+#### Create weight matrix and bias vector
+
+The parameter tensors are created with both `requires_grad` and `stores_grad` 
set to True.
+
+```
+w0 = Tensor(shape=(2, 3), requires_grad=True, stores_grad=True)
+w0.gaussian(0.0, 0.1)
+b0 = Tensor(shape=(1, 3), requires_grad=True, stores_grad=True)
+b0.set_value(0.0)
+
+w1 = Tensor(shape=(3, 2), requires_grad=True, stores_grad=True)
+w1.gaussian(0.0, 0.1)
+b1 = Tensor(shape=(1, 2), requires_grad=True, stores_grad=True)
+b1.set_value(0.0)
+```
+
+#### Training
+```
+inputs = Tensor(data=data)  # data matrix
+target = Tensor(data=label) # label vector
+autograd.training = True    # for training
+sgd = opt.SGD(0.05)   # optimizer
+
+for i in range(10):
+    x = autograd.matmul(inputs, w0) # matrix multiplication
+    x = autograd.add_bias(x, b0)    # add the bias vector
+    x = autograd.relu(x)            # ReLU activation operation
+
+    x = autograd.matmul(x, w1)
+    x = autograd.add_bias(x, b1)
+    
+    loss = autograd.softmax_cross_entropy(x, target)
+    
+    for p, g in autograd.backward(loss):        
+        sgd.update(p, g)
+```
+
+
+### Operation + Layer
+
+The following 
[example](https://github.com/apache/incubator-singa/blob/master/examples/autograd/mnist_cnn.py)
 implements a CNN model using layers provided by the autograd module.
+
+#### Create the layers
+
+```
+conv1 = autograd.Conv2d(1, 32, 3, padding=1, bias=False)
+bn1 = autograd.BatchNorm2d(32)
+pooling1 = autograd.MaxPool2d(3, 1, padding=1)
+conv21 = autograd.Conv2d(32, 16, 3, padding=1)
+conv22 = autograd.Conv2d(32, 16, 3, padding=1)
+bn2 = autograd.BatchNorm2d(32)
+linear = autograd.Linear(32 * 28 * 28, 10)    
+pooling2 = autograd.AvgPool2d(3, 1, padding=1)
+```
+
+#### Define the forward function
+
+The operations in the forward pass will be recorded automatically for backward 
propagation.
+
+```
+def forward(x, t):
+    # x is the input data (a batch of images)
+    # t the the label vector (a batch of integers)
+    y = conv1(x)           # Conv layer  
+    y = autograd.relu(y)   # ReLU operation
+    y = bn1(y)             # BN layer
+    y = pooling1(y)        # Pooling Layer
+    
+    # two parallel convolution layers
+    y1 = conv21(y)
+    y2 = conv22(y)
+    y = autograd.cat((y1, y2), 1)  # cat operation
+    y = autograd.relu(y)           # ReLU operation
+    y = bn2(y)
+    y = pooling2(y)
+
+    y = autograd.flatten(y)        # flatten operation
+    y = linear(y)                  # Linear layer
+    loss = autograd.softmax_cross_entropy(y, t)  # operation 
+    return loss, y
+```
+
+#### Training
+
+```
+autograd.training = True
+for epoch in range(epochs):
+    for i in range(batch_number):
+        inputs = tensor.Tensor(device=dev, data=x_train[
+                               i * batch_sz:(1 + i) * batch_sz], 
stores_grad=False)
+        targets = tensor.Tensor(device=dev, data=y_train[
+                                i * batch_sz:(1 + i) * batch_sz], 
requires_grad=False, stores_grad=False)
+
+        loss, y = forward(inputs, targets) # forward the net
+    
+        for p, gp in autograd.backward(loss):  # auto backward
+            sgd.update(p, gp)
+```

http://git-wip-us.apache.org/repos/asf/incubator-singa/blob/8143cf40/doc/en/docs/autograd_doc.md
----------------------------------------------------------------------
diff --git a/doc/en/docs/autograd_doc.md b/doc/en/docs/autograd_doc.md
deleted file mode 100644
index 1ae7833..0000000
--- a/doc/en/docs/autograd_doc.md
+++ /dev/null
@@ -1,241 +0,0 @@
-# singa.autograd
-
-This part will present an overview of how autograd works and give a simple 
example of neuron network which is implemented by using autograd API. 
-## Autograd Mechanics
-To get clear about how autograd system works, we should understand three 
important abstracts in this system, they are `singa.tensor.Tensor` , 
`singa.autograd.Operation`, and `singa.autograd.Layer`.  For briefness, these 
three classes will be denoted as `tensor`, `operation`, and `layer`.
-### Tensor
-The class `tensor` has three attributes which are important in autograd 
system, they are `.creator`, `.requires_grad`, and `.stores_grad`.
--  `tensor.creator` is an `operation` object. It records the particular 
`operation` which generates the `tensor ` itself.
--  `.requires_grad` and `.stores_grad` are both boolean indicators. These two 
attributes record whether a `tensor` needs gradients and whether gradients of a 
 `tensor` need to be stored when do backpropagation. For example, output 
`tensor` of `Conv2d` needs gradient but no need to store gradient. In contrast, 
parameter `tensor` of `Conv2d` not only require gradients but also need to 
store gradients. For those input `tensor` of a network, e.g., a batch of 
images, since it don't require gradient and don't need to store gradient, both 
of the two indicators,  `.requires_grad` and `.stores_grad`, should be set as 
False.
-It should be noted that if `.stores_grad` is true, then `.requires_grad` must 
be true, not vice versa.
-### Operation
-A `operation` takes one or more `tensor` as input, and then output one or more 
`tensor`. when a  `operation` is called, mainly two processes happen:
-   1. record source of the `operaiton`. Those inputs `tensor` contain their 
`creator` information, which are the source `operation` of current operation. 
Current `operation` keeps those information in the attribute `.src`. The 
designed autograd engine can control backward flow according to `operation.src`.
-     2. do calculation by calling member function `.forward()`
-
-The class `operation` has two important member functions, `.forward()` and 
`.backward()`. These two functions take `tensor.data` as inputs, and output 
`Ctensor`, which is the same type with `tensor.data`. To add a specific 
operation, subclass `operation` should implement their own `.forward()` and 
`.backward()`.
-### Layer
-For those operations containing parameters, e.g., the weight or bias tensors, 
we package them into a new class, `layer`. Users should initialize a `layer` 
before invoking it.
-When a `layer` is called, it will send inputs `tensor` together with the 
parameter `tensor` to the corresponding operation to construct the computation 
graph. One layer may call multiple operations. 
-## Python API
-## Example
-The following codes implement a Xception Net using autograd API. They can be 
found in source code of SINGA at 
-  `incubator-singa/examples/autograd/xceptionnet.py`
-### 1.  Import packages
-```
-from singa import autograd
-from singa import tensor
-from singa import device
-from singa import opt
-
-import numpy as np
-from tqdm import trange
-```
-### 2. Create model
-Firstly, we create the basic module, named `Block`, which occurs repeatedly in 
Xception architecture. The `Block` class consists of  `SeparableConv2d`, 
`ReLU`, `BatchNorm2d` and `MaxPool2d`. It also has linear residual connections. 
-```
-class Block(autograd.Layer):
-
-    def __init__(self, in_filters, out_filters, reps, strides=1, padding=0, 
start_with_relu=True, grow_first=True):
-        super(Block, self).__init__()
-
-        if out_filters != in_filters or strides != 1:
-            self.skip = autograd.Conv2d(in_filters, out_filters,
-                                        1, stride=strides, padding=padding, 
bias=False)
-            self.skipbn = autograd.BatchNorm2d(out_filters)
-        else:
-            self.skip = None
-
-        self.layers = []
-
-        filters = in_filters
-        if grow_first:
-            self.layers.append(autograd.ReLU())
-            self.layers.append(autograd.SeparableConv2d(in_filters, 
out_filters,
-                                                        3, stride=1, 
padding=1, bias=False))
-            self.layers.append(autograd.BatchNorm2d(out_filters))
-            filters = out_filters
-
-        for i in range(reps - 1):
-            self.layers.append(autograd.ReLU())
-            self.layers.append(autograd.SeparableConv2d(filters, filters,
-                                                        3, stride=1, 
padding=1, bias=False))
-            self.layers.append(autograd.BatchNorm2d(filters))
-
-        if not grow_first:
-            self.layers.append(autograd.ReLU())
-            self.layers.append(autograd.SeparableConv2d(in_filters, 
out_filters,
-                                                        3, stride=1, 
padding=1, bias=False))
-            self.layers.append(autograd.BatchNorm2d(out_filters))
-
-        if not start_with_relu:
-            self.layers = self.layers[1:]
-        else:
-            self.layers[0] = autograd.ReLU()
-
-        if strides != 1:
-            self.layers.append(autograd.MaxPool2d(3, strides, padding + 1))
-
-    def __call__(self, x):
-        y = self.layers[0](x)
-        for layer in self.layers[1:]:
-            if isinstance(y, tuple):
-                y = y[0]
-            y = layer(y)
-
-        if self.skip is not None:
-            skip = self.skip(x)
-            skip = self.skipbn(skip)
-        else:
-            skip = x
-        y = autograd.add(y, skip)
-        return y
-```
-The second step is to build a `Xception` class. 
-When do initialization, we create all sublayers which containing parameters. 
-In member function `feature()`, we input a `tensor`, which contains 
information of training data(images), then `feature()` will output their 
representations. Those extracted features will then be sent to `logits` 
function to do classification. 
-```
-class Xception(autograd.Layer):
-    """
-    Xception optimized for the ImageNet dataset, as specified in
-    https://arxiv.org/pdf/1610.02357.pdf
-    """
-
-    def __init__(self, num_classes=1000):
-        """ Constructor
-        Args:
-            num_classes: number of classes
-        """
-        super(Xception, self).__init__()
-        self.num_classes = num_classes
-
-        self.conv1 = autograd.Conv2d(3, 32, 3, 2, 0, bias=False)
-        self.bn1 = autograd.BatchNorm2d(32)
-
-        self.conv2 = autograd.Conv2d(32, 64, 3, 1, 1, bias=False)
-        self.bn2 = autograd.BatchNorm2d(64)
-
-        self.block1 = Block(
-            64, 128, 2, 2, padding=0, start_with_relu=False, grow_first=True)
-        self.block2 = Block(
-            128, 256, 2, 2, padding=0, start_with_relu=True, grow_first=True)
-        self.block3 = Block(
-            256, 728, 2, 2, padding=0, start_with_relu=True, grow_first=True)
-
-        self.block4 = Block(
-            728, 728, 3, 1, start_with_relu=True, grow_first=True)
-        self.block5 = Block(
-            728, 728, 3, 1, start_with_relu=True, grow_first=True)
-        self.block6 = Block(
-            728, 728, 3, 1, start_with_relu=True, grow_first=True)
-        self.block7 = Block(
-            728, 728, 3, 1, start_with_relu=True, grow_first=True)
-
-        self.block8 = Block(
-            728, 728, 3, 1, start_with_relu=True, grow_first=True)
-        self.block9 = Block(
-            728, 728, 3, 1, start_with_relu=True, grow_first=True)
-        self.block10 = Block(
-            728, 728, 3, 1, start_with_relu=True, grow_first=True)
-        self.block11 = Block(
-            728, 728, 3, 1, start_with_relu=True, grow_first=True)
-
-        self.block12 = Block(
-            728, 1024, 2, 2, start_with_relu=True, grow_first=False)
-
-        self.conv3 = autograd.SeparableConv2d(1024, 1536, 3, 1, 1)
-        self.bn3 = autograd.BatchNorm2d(1536)
-
-        # do relu here
-        self.conv4 = autograd.SeparableConv2d(1536, 2048, 3, 1, 1)
-        self.bn4 = autograd.BatchNorm2d(2048)
-
-        self.globalpooling = autograd.MaxPool2d(10, 1)
-        self.fc = autograd.Linear(2048, num_classes)
-
-    def features(self, input):
-        x = self.conv1(input)
-        x = self.bn1(x)
-        x = autograd.relu(x)
-
-        x = self.conv2(x)
-        x = self.bn2(x)
-        x = autograd.relu(x)
-
-        x = self.block1(x)
-        x = self.block2(x)
-        x = self.block3(x)
-        x = self.block4(x)
-        x = self.block5(x)
-        x = self.block6(x)
-        x = self.block7(x)
-        x = self.block8(x)
-        x = self.block9(x)
-        x = self.block10(x)
-        x = self.block11(x)
-        x = self.block12(x)
-
-        x = self.conv3(x)
-        x = self.bn3(x)
-        x = autograd.relu(x)
-
-        x = self.conv4(x)
-        x = self.bn4(x)
-        return x
-
-    def logits(self, features):
-        x = autograd.relu(features)
-        x = self.globalpooling(x)
-        x = autograd.flatten(x)
-        x = self.fc(x)
-        return x
-
-    def __call__(self, input):
-        x = self.features(input)
-        x = self.logits(x)
-        return x
-```
-
-We can create a Xception Net by the following command:
-
-`model = Xception(num_classes=1000)`
-
-### 3. Sample data
-Sampling virtual images and labels by numpy.random.
-Those virtual images are in shape (3, 299, 299).
-The training batch size is set as 16.
-To transfer information from numpy array to SINGA `tensor`, We should firstly 
create SINGA `tensor`, e.g., tx and ty,  then call their member function 
`copy_from_numpy`.
-```
-IMG_SIZE = 299
-batch_size = 16
-tx = tensor.Tensor((batch_size, 3, IMG_SIZE, IMG_SIZE), dev)
-ty = tensor.Tensor((batch_size,), dev, tensor.int32)
-x = np.random.randn(batch_size, 3, IMG_SIZE, IMG_SIZE).astype(np.float32)
-y = np.random.randint(0, 1000, batch_size, dtype=np.int32)
-tx.copy_from_numpy(x)
-ty.copy_from_numpy(y)
-```
-
-### 4. Set learning parameters and create optimizer
-The number of iterations is set as 20 while optimizer is chosen as SGD with 
learning rate=0.1, momentum=0.9 and weight_decay=1e-5.
-```
-niters = 20
-sgd = opt.SGD(lr=0.1, momentum=0.9, weight_decay=1e-5)
-```
-### 5. Train model
-Set `autograd.training` as true:
-`autograd.training = True`
-
-Then start training:
-```
-with trange(niters) as t:
-        for b in t:
-            x = model(tx)
-            loss = autograd.softmax_cross_entropy(x, ty)
-            for p, g in autograd.backward(loss):
-                sgd.update(p, g)
-```
- 
-
-
-

Reply via email to