chrishkchris opened a new pull request #566: SINGA-487 Add Sparsification 
Algorithm: Threshold Quantization
URL: https://github.com/apache/singa/pull/566
 
 
   This PR implements a simple sparsification scheme, we transfer only gradient 
value which is greater than an absolute threshold value. When we make use of 
cuda thrust parallel algorithm to convert the dense matrix into sparse matrix, 
the overhead is relatively low.
   
   Some reference papers for the Sparsification:
   [1] N. Strom. Scalable distributed dnn training using commodity gpu cloud 
computing. In Proceedings of gpu cloud computing. In Proceedings of the 
InterSpeech 2015. International Speech
   Communication Association (ISCA), September 2015.
   [2] A. F. Aji and K. Hea
   eld. Sparse communication for distributed gradient descent. In Proceedings 
of the 2017 Conference on Empirical Methods in Natural Language Processing 
(EMNLP 2017), pages 440{445. Association for Computational Linguistics (ACL), 
September 2017.
   
   I have added an examples file sparsification_mnist.py to test the accuracy. 
The following results is based on a 4 GPUs AWS instance g4.dn12xlarge of the 
GPU model T4. 
   
   ```
   ubuntu@ip-172-31-20-160:~/singa/examples/autograd$ python3 
sparsification_mnist.py
   Starting Epoch 0:
   Training loss = 809.631958, training accuracy = 0.709352
   Evaluation accuracy = 0.905849, Elapsed Time = 1.251285s
   Starting Epoch 1:
   Training loss = 325.436279, training accuracy = 0.888906
   Evaluation accuracy = 0.936098, Elapsed Time = 0.882350s
   Starting Epoch 2:
   Training loss = 238.643738, training accuracy = 0.920106
   Evaluation accuracy = 0.952424, Elapsed Time = 0.847908s
   Starting Epoch 3:
   Training loss = 200.181030, training accuracy = 0.933377
   Evaluation accuracy = 0.947616, Elapsed Time = 0.839072s
   Starting Epoch 4:
   Training loss = 182.340820, training accuracy = 0.938969
   Evaluation accuracy = 0.962240, Elapsed Time = 0.836915s
   Starting Epoch 5:
   Training loss = 161.267120, training accuracy = 0.946615
   Evaluation accuracy = 0.970653, Elapsed Time = 0.839940s
   Starting Epoch 6:
   Training loss = 147.990921, training accuracy = 0.951356
   Evaluation accuracy = 0.970753, Elapsed Time = 0.842795s
   Starting Epoch 7:
   Training loss = 139.301285, training accuracy = 0.953626
   Evaluation accuracy = 0.973458, Elapsed Time = 0.842011s
   Starting Epoch 8:
   Training loss = 131.042053, training accuracy = 0.956564
   Evaluation accuracy = 0.963241, Elapsed Time = 0.840951s
   Starting Epoch 9:
   Training loss = 126.376511, training accuracy = 0.957732
   Evaluation accuracy = 0.967448, Elapsed Time = 0.841526s
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to