chrishkchris opened a new pull request #787:
URL: https://github.com/apache/singa/pull/787


   Thanks Rulin @XJDKC for fixing the graph operation in this PR
   
   The problem was due to the workspace variable in rnn. 
   
   The problem appears when the ops clean up the workspace variable 
(setValue->0) before use for rnn operation multiple times in the graph, while 
some ops using the workspace are independent to each other.
   
   To solve the problem, if a tensor is written by multiple independent ops, 
these ops should be performed by time.
   
   Some of the test:
   
   ```
   root@64926e30597f:~/dcsysh/singa/examples/rnn# python3 imdb_train.py
   epoch 0 loss [0.6489457]; acc 0.617
   epoch 1 loss [0.55472153]; acc 0.715
   epoch 2 loss [0.51863945]; acc 0.743
   epoch 3 loss [0.49822766]; acc 0.758
   epoch 4 loss [0.48312518]; acc 0.767
   eval acc 0.750
   
   root@64926e30597f:~/dcsysh/singa/examples/cnn# python3 train_cnn.py resnet 
cifar10 -b 32 -m 1
   Starting Epoch 0:
   Training loss = 2867.570801, training accuracy = 0.352753
   Evaluation accuracy = 0.467448, Elapsed Time = 335.571664s
   
   root@64926e30597f:~/dcsysh/singa/examples/cnn# python3 train_cnn.py resnet 
cifar10 -b 32 -m 1 -g
   Starting Epoch 0:
   Training loss = 2866.714111, training accuracy = 0.352693
   Evaluation accuracy = 0.488381, Elapsed Time = 379.508079s
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to