joddiy edited a comment on issue #696:
URL: https://github.com/apache/singa/issues/696#issuecomment-628548737
> Shall we go with the following APIs?
> @joddiy @dcslin @XJDKC
> They should be compatible with the current APIs.
>
> ```python
> class Module:
> def compile(self, inputs, is_train, use_graph, graph_alg):
> set train, graph etc config
> turn off graph
> if inputs are not filled, print warnings and fill inputs according
to data type.
> self.forward(*inputs)
>
> def load(self, ckp_path, include_state=False):
> load onnx model and copy the params to each layer;
> generate warnings for mismatched layers/params.
> restore the states and return it as a dict
>
> def save(self, ckp_path, state={}):
> save the model as onnx format
> save the states
>
> def forward(self, x): # turn on graph if necessary
> pass
>
> def train_one_batch(self, x, y): # turn on graph if necessary
> pass
>
> @deprecated
> def loss(self, ):
> pass
>
> @deprecated
> def optim(self,):
> pass
>
>
> class Layer:
> def __init__(name=None):
> self.init = False
>
> def __call__(self, x):
> if self.init == False:
> init layer states
> else:
> # do the forward propagation
>
>
> class MyLayer(Layer):
> def __init__(self):
> self.layer1 = layer.Conv2d(nb_kernels = 32, kernel=3, stride=1,
padding=0, kernel_init='he_uniform')
> self.layer2 = layer.MaxPool2d(kernel=3, stride=2)
>
> def forward(self, x):
> return self.layer2(self.layer1(x))
>
>
>
> class MyModule(Module):
> def __init__(self):
> self.blk1 = MyLayer()
> self.blk2 = MyLayer()
> self.optim = SGD()
> self.loss = CrossEntropyLoss()
>
> def forward(self, x):
> return self.blk2(self.blk1(x))
>
> def train_one_batch(self, x, y):
> y_ = self.forward(x)
> l = self.loss(y_, y)
> self.optim.backward_and_update(l)
> return l
>
> x = Placeholder((2, 3), device = gpu, dtype=singa.float) # alias of Tensor
> fill x with values
> m = MyModel()
>
> # compatible with existing code which does not have the following two
statements.
> m.compile([x], is_train=True, use_graph=True, graph_alg='sequence')
> for pname, ptensor in m.get_params():
> ptensor.uniform(-1, 1) # not necessary if each layer's param init
methods are configured.
>
> y = Placeholder((2,), device = gpu)
> for npx, npy in data:
> x.copy_from(npx)
> y.copy_from(npy)
> m.train_one_batch(x, y) # build the graph in the first iter. For the
old code, the params are initialized here.
>
> m.save('mymodel', state={'epoch': data.size(), 'sgd': m.optim}
> ```
This approach still postpones the operation init till the training phase
right? When the user has a batch of samples, he calls `train_one_batch`, to
call `forward`, and then to call `_call_`:
```py
def __call__(self, x):
if self.init == False:
init layer states
```
it's still strange to init the graph until the user has the data.
In my opinion, the current problem is,
1. we don't have the shape of the input -> so we using a Placeholder as the
input
2. even we have the shape of input data, we cannot compute the all shapes of
intermediate tensors since we cannot call the forward with Placeholder -> we
may want to init random data but it may incur error.
So, the key point is, we bind the graph construction with `forward`
function. Only when we call forward, we construct the graph. But if we want to
call forward we must have the real data.
Then I'm thinking about separating the graph construction with `forward`
function. We define several classes called `Graph`, `Node`, the `Graph` stores
relationship between `Node`s, and `Node`s stores an `Operation` as well as its
input and output.
In the `_call_` function of an `Operation`, we don't call the `forward`
function, instead, create a `Node`, and stores this operation itself within
this `Node`, set its input and output, then return this newly created `Node`.
So finally, in the following code:
```py
class Operation(object):
def __init__(self):
pass
def __call__(self, previous_node): # for multiply input is similiar
# create an Node
# link the current with previous node
# do the infer_shape, set the shape of each input and output for the
current node and previous node
current_node = new Node()
current_node.input.node = previous_node
current_node.operation = self
current_node.output.shape = infer_shape()
previous_node.output.node = current_node
return current_node
def forward():
pass
def backward():
pass
def infer_shape():
pass
```
We actually constructed a `Graph` linked with `Node` by using the following
code:
```py
class MyModule(Module):
def __init__(self):
super(Model, self).__init__()
self.conv1 = autograd.Conv2d(1, 20, 5, padding=0)
self.conv2 = autograd.Conv2d(20, 50, 5, padding=0)
self.sgd = opt.SGD(lr=0.01)
def construt_graph(self, x):
# x is a placeholder
# create the Graph linked with Node
y = self.conv1(x)
y = self.conv2(y)
self.graph = Graph(x, y)
def train(self, x, y):
y_ = self.graph.forward(x)
l = self.loss(y_, y)
self.optim.backward_and_update(l)
return l
def loss(self, out, y):
return autograd.softmax_cross_entropy(out, y)
def optim(self, loss):
self.sgd.backward_and_update(loss)
model = MyModule()
x = Placeholder((2, 3), device = gpu, dtype=singa.float) # alias of Tensor
model.construt_graph(x) # build the graph
y = Placeholder((2,), device = gpu)
for npx, npy in data:
x.copy_from(npx)
y.copy_from(npy)
m.train(x, y) # directly train
m.save('mymodel', state={'epoch': data.size(), 'sgd': m.optim}
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]