chrishkchris opened a new pull request #535: SINGA-490 Optimize performance of 
stochastic gradient descent (SGD)
URL: https://github.com/apache/incubator-singa/pull/535
 
 
   I have fused the small operations of momentum SGD so as to increase GPU 
computation efficiency and decrease latency. Moreover, I have added the Sync() 
function for better time profiling in resnet.py (wait for the previous cuda 
operations to be finished before start calculating the time).
   
   1. This is the new result after improving the momentum SGD:
   
   ```
   ubuntu@ip-172-31-39-137:~/incubator-singa/examples/autograd$ python3 
mnist_cnn.py
   Starting Epoch 0:
   Training loss = 583.052124, training accuracy = 0.793690
   Evaluation accuracy = 0.943409, Elapsed Time = 4.191409s
   Starting Epoch 1:
   Training loss = 229.894424, training accuracy = 0.923609
   Evaluation accuracy = 0.961438, Elapsed Time = 4.170332s
   Starting Epoch 2:
   Training loss = 168.670303, training accuracy = 0.943937
   Evaluation accuracy = 0.964744, Elapsed Time = 4.186504s
   Starting Epoch 3:
   Training loss = 133.865494, training accuracy = 0.955259
   Evaluation accuracy = 0.978566, Elapsed Time = 4.188593s
   Starting Epoch 4:
   Training loss = 116.104378, training accuracy = 0.961730
   Evaluation accuracy = 0.971554, Elapsed Time = 4.195830s
   Starting Epoch 5:
   Training loss = 101.295425, training accuracy = 0.966299
   Evaluation accuracy = 0.974059, Elapsed Time = 4.191312s
   Starting Epoch 6:
   Training loss = 94.570869, training accuracy = 0.969684
   Evaluation accuracy = 0.977464, Elapsed Time = 4.181115s
   Starting Epoch 7:
   Training loss = 85.930618, training accuracy = 0.970968
   Evaluation accuracy = 0.984675, Elapsed Time = 4.182598s
   Starting Epoch 8:
   Training loss = 83.169617, training accuracy = 0.971768
   Evaluation accuracy = 0.985076, Elapsed Time = 4.202356s
   Starting Epoch 9:
   Training loss = 77.906853, training accuracy = 0.973969
   Evaluation accuracy = 0.982372, Elapsed Time = 4.191382s
   ubuntu@ip-172-31-39-137:~/incubator-singa/examples/autograd$ python3 
resnet.py
   Start intialization............
   
100%|███████████████████████████████████████████████████████████████████████| 
100/100 [01:26<00:00,  1.14it/s]
   Throughput = 36.89267491263885 per second
   Total=0.8673808574676514, forward=0.2684857630729675, 
softmax=0.0027115750312805176, backward=0.5961835193634033, 
sgd=0.03734057664871216
   ```
   
   2. This is the old result before improving the momentum SGD:
   ```
   ubuntu@ip-172-31-39-137:~/incubator-singa/examples/autograd$ python3 
mnist_cnn.py
   Starting Epoch 0:
   Training loss = 581.382263, training accuracy = 0.794974
   Evaluation accuracy = 0.934495, Elapsed Time = 5.541576s
   Starting Epoch 1:
   Training loss = 233.281906, training accuracy = 0.920808
   Evaluation accuracy = 0.953025, Elapsed Time = 5.492121s
   Starting Epoch 2:
   Training loss = 169.505447, training accuracy = 0.943503
   Evaluation accuracy = 0.971454, Elapsed Time = 5.493372s
   Starting Epoch 3:
   Training loss = 136.643906, training accuracy = 0.954309
   Evaluation accuracy = 0.975761, Elapsed Time = 5.513660s
   Starting Epoch 4:
   Training loss = 116.743042, training accuracy = 0.960963
   Evaluation accuracy = 0.979968, Elapsed Time = 5.526858s
   Starting Epoch 5:
   Training loss = 103.864464, training accuracy = 0.965732
   Evaluation accuracy = 0.979667, Elapsed Time = 5.513694s
   Starting Epoch 6:
   Training loss = 94.542282, training accuracy = 0.968550
   Evaluation accuracy = 0.975461, Elapsed Time = 5.520474s
   Starting Epoch 7:
   Training loss = 87.548050, training accuracy = 0.971368
   Evaluation accuracy = 0.980970, Elapsed Time = 5.535038s
   Starting Epoch 8:
   Training loss = 83.162071, training accuracy = 0.971485
   Evaluation accuracy = 0.975661, Elapsed Time = 5.536836s
   Starting Epoch 9:
   Training loss = 78.447533, training accuracy = 0.974570
   Evaluation accuracy = 0.982772, Elapsed Time = 5.547574s
   ubuntu@ip-172-31-39-137:~/incubator-singa/examples/autograd$ python3 
resnet.py
   Start intialization............
   
100%|███████████████████████████████████████████████████████████████████████| 
100/100 [01:49<00:00,  1.11s/it]
   Throughput = 29.05542749993395 per second
   Total=1.101343286037445, forward=0.270987823009491, 
softmax=0.0029543495178222657, backward=0.8274011135101318, 
sgd=0.3130151700973511
   ```
   
   From above two sets of results (1) and (2), we can see that the new momentum 
SGD is much faster after fusing the small operations.
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

Reply via email to