chrishkchris commented on issue #535: SINGA-490 Optimize performance of 
stochastic gradient descent (SGD)
URL: https://github.com/apache/incubator-singa/pull/535#issuecomment-532944885
 
 
   Finally, I test the distributed training in AWS p2.x8large, after adding the 
Sync() in the SGD loop of resnet.py and resnet_dist.py.
   The speed up of using 8 GPUs is now 7.21x, but this is compared without real 
data feeding. 
   See the following throughput comparison in resnet.py and resnet_dist.py:
   
   ```
   ubuntu@ip-172-31-28-231:~/incubator-singa/examples/autograd$ python3 
resnet.py
   Start intialization............
   
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████|
 100/100 [01:23<00:00,  1.19it/s]
   Throughput = 38.13589358185999 per second
   Total=0.8391045022010803, forward=0.26401839971542357, 
softmax=0.0020227289199829103, backward=0.5730633735656739, 
sgd=0.016838366985321044
   
   ubuntu@ip-172-31-28-231:~/incubator-singa/examples/autograd$ 
/home/ubuntu/mpich-3.3/build/bin/mpiexec --hostfile host_file python3 
resnet_dist.py
   Start intialization...........
   100%|██████████| 100/100 [01:33<00:00,  1.08it/s]
   Throughput = 274.9947180123401 per second
   Total=0.9309269714355469, forward=0.2690380573272705, 
softmax=0.0021610450744628906, backward=0.6597278690338135, 
sgd=0.10374969005584717
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

Reply via email to