[GitHub] [singa] joddiy edited a comment on issue #696: Refactor autograd module

GitBox Thu, 14 May 2020 03:42:04 -0700


joddiy edited a comment on issue #696:
URL: https://github.com/apache/singa/issues/696#issuecomment-628548737



   > Shall we go with the following APIs?
   > @joddiy @dcslin @XJDKC
   > They should be compatible with the current APIs.
   > 
   > ```python
   > class Module:
   >     def compile(self, inputs, is_train, use_graph, graph_alg):
   >         set train, graph etc config
   >         turn off graph
   >         if inputs are not filled, print warnings and fill inputs according 
to data type.
   >         self.forward(*inputs)
   >     
   >      def load(self, ckp_path, include_state=False):
   >        load onnx model and copy the params to each layer; 
   >        generate warnings for mismatched layers/params.
   >        restore the states and return it as a dict
   >      
   >      def save(self, ckp_path, state={}):
   >        save the model as onnx format
   >        save the states
   >     
   >      def forward(self, x):    # turn on graph if necessary
   >         pass
   > 
   >      def train_one_batch(self, x, y):  # turn on graph if necessary
   >         pass   
   >    
   >      @deprecated 
   >      def loss(self, ):
   >         pass
   > 
   >       @deprecated 
   >       def optim(self,):
   >           pass      
   > 
   > 
   > class Layer:
   >     def __init__(name=None):
   >       self.init = False
   >       
   >     def __call__(self, x):
   >        if self.init == False:
   >            init layer states
   >        else:
   >           # do the forward propagation 
   > 
   > 
   > class MyLayer(Layer):
   >      def __init__(self):
   >           self.layer1 = layer.Conv2d(nb_kernels = 32, kernel=3, stride=1, 
padding=0, kernel_init='he_uniform') 
   >           self.layer2 = layer.MaxPool2d(kernel=3, stride=2)
   > 
   >       def forward(self, x):
   >           return self.layer2(self.layer1(x))
   > 
   > 
   > 
   > class MyModule(Module):
   >      def __init__(self):
   >            self.blk1 = MyLayer()
   >            self.blk2 = MyLayer()
   >            self.optim = SGD()
   >            self.loss = CrossEntropyLoss()
   > 
   >       def forward(self, x):
   >            return self.blk2(self.blk1(x))    
   > 
   >       def train_one_batch(self, x, y): 
   >            y_ = self.forward(x)
   >            l = self.loss(y_, y)
   >            self.optim.backward_and_update(l)
   >            return l
   > 
   > x = Placeholder((2, 3), device = gpu, dtype=singa.float) # alias of Tensor
   > fill x with values
   > m = MyModel()
   > 
   > # compatible with existing code which does not have the following two 
statements.
   > m.compile([x], is_train=True, use_graph=True, graph_alg='sequence')
   > for pname, ptensor in m.get_params():
   >     ptensor.uniform(-1, 1)   # not necessary if each layer's param init 
methods are configured.
   > 
   > y = Placeholder((2,), device = gpu)
   > for npx, npy in data:
   >    x.copy_from(npx)
   >    y.copy_from(npy)
   >    m.train_one_batch(x, y)  # build the graph in the first iter.  For the 
old code, the params are initialized here.
   > 
   > m.save('mymodel', state={'epoch': data.size(), 'sgd': m.optim}
   > ```
   
   This approach still postpones the operation init till the training phase 
right? When the user has a batch of samples, he calls `train_one_batch`, to 
call `forward`, and then to call `_call_`:
   ```py
   def __call__(self, x):
       if self.init == False:
           init layer states
   ```
   it's still strange to init the graph until the user has the data.
   
   In my opinion, the current problem is, 
   1. we don't have the shape of the input -> so we using a Placeholder as the 
input
   2. even we have the shape of input data, we cannot compute the all shapes of 
intermediate tensors since we cannot call the forward with Placeholder -> we 
may want to init random data but it may incur error.
   
   So, the key point is, we bind the graph construction with `forward` 
function. Only when we call forward, we construct the graph. But if we want to 
call forward we must have the real data.
   
   Then I'm thinking about separating the graph construction with `forward` 
function. We define several classes called `Graph`, `Node`, the `Graph` stores 
relationship between `Node`s, and `Node`s stores an `Operation` as well as its 
input and output.  
   
   In the `_call_` function of an `Operation`, we don't call the `forward` 
function, instead, create a `Node`, and stores this operation itself within 
this `Node`, set its input and output, then return this newly created `Node`. 
So finally, in the following code:
   ```py
   class Operation(object):
       def __init__(self):
           pass
   
       def __call__(self, previous_node): # for multiply input is similiar
           # create an Node
           # link the current with previous node
           # do the infer_shape, set the shape of each input and output for the 
current node and previous node
           current_node = new Node()
           current_node.input.node = previous_node
           current_node.operation = self
           current_node.output.shape = infer_shape()
           previous_node.output.node = current_node
           return current_node
   
       def forward():
           pass
   
       def backward():
           pass
   
       def infer_shape():
           pass
   ```
   
   
   We actually constructed a `Graph` linked with `Node` by using the following 
code:
   ```py
   class MyModule(Module):
       def __init__(self):
           super(Model, self).__init__()
   
           self.conv1 = autograd.Conv2d(1, 20, 5, padding=0)
           self.conv2 = autograd.Conv2d(20, 50, 5, padding=0)
   
           self.sgd = opt.SGD(lr=0.01)
   
       def construt_graph(self, x):
           # x is a placeholder
          # create the Graph linked with Node
           y = self.conv1(x)
           y = self.conv2(y)
           self.graph = Graph(x, y)
   
       def train(self, x, y): 
           y_ = self.graph.forward(x)
           l = self.loss(y_, y)
           self.optim.backward_and_update(l)
           return l
   
       def loss(self, out, y):
           return autograd.softmax_cross_entropy(out, y)
   
       def optim(self, loss):
           self.sgd.backward_and_update(loss)
   
   model = MyModule()
   x = Placeholder((2, 3), device = gpu, dtype=singa.float) # alias of Tensor
   model.construt_graph(x) # build the graph
   
   y = Placeholder((2,), device = gpu)
   for npx, npy in data:
      x.copy_from(npx)
      y.copy_from(npy)
      m.train(x, y)  # directly train
   
   m.save('mymodel', state={'epoch': data.size(), 'sgd': m.optim}
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [singa] joddiy edited a comment on issue #696: Refactor autograd module

Reply via email to