chrishkchris commented on a change in pull request #468: Distributted module
URL: https://github.com/apache/incubator-singa/pull/468#discussion_r317937233
 
 

 ##########
 File path: examples/autograd/mnist_dist.py
 ##########
 @@ -0,0 +1,251 @@
+#
 
 Review comment:
   Has modified mnist_cnn.py and mnist_dist.py:
   1. the model construction, data preprocessing and training code are in 
mnist_cnn.py
   2. mnist_dist.py import mnist_cnn functions and passes the dist opt into 
train_mnist_cnn() to conduct dist training (needs MPI).
   3. the download_mnist.py is added at the same dir, which is used to download 
the dataset before the training. It is separated out from the training code to 
prevent different process downloading data at the same time.
   
   Here is the log of running the code:
   ```
   ubuntu@ip-172-31-21-218:~/incubator-singa/examples/autograd$ python3 
download_mnist.py       
   Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz
   Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz
   Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz
   Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz
   ubuntu@ip-172-31-21-218:~/incubator-singa/examples/autograd$ python3 
mnist_cnn.py            
   Starting Epoch 0:
   Training loss = 586.417175, training accuracy = 0.792840
   Evaluation accuracy = 0.940104, Elapsed Time = 5.638494s
   Starting Epoch 1:
   Training loss = 235.360107, training accuracy = 0.922292
   Evaluation accuracy = 0.955429, Elapsed Time = 5.563161s
   Starting Epoch 2:
   Training loss = 170.056442, training accuracy = 0.943270
   Evaluation accuracy = 0.963942, Elapsed Time = 5.579273s
   Starting Epoch 3:
   Training loss = 135.514252, training accuracy = 0.954476
   Evaluation accuracy = 0.967248, Elapsed Time = 5.562721s
   Starting Epoch 4:
   Training loss = 116.975700, training accuracy = 0.960812
   Evaluation accuracy = 0.978265, Elapsed Time = 5.583826s
   Starting Epoch 5:
   Training loss = 103.893723, training accuracy = 0.965065
   Evaluation accuracy = 0.982372, Elapsed Time = 5.585272s
   Starting Epoch 6:
   Training loss = 95.044586, training accuracy = 0.967266
   Evaluation accuracy = 0.981671, Elapsed Time = 5.580424s
   Starting Epoch 7:
   Training loss = 89.102654, training accuracy = 0.971118
   Evaluation accuracy = 0.980268, Elapsed Time = 5.583646s
   Starting Epoch 8:
   Training loss = 80.395744, training accuracy = 0.972969
   Evaluation accuracy = 0.983273, Elapsed Time = 5.600029s
   Starting Epoch 9:
   Training loss = 78.355209, training accuracy = 0.973119
   Evaluation accuracy = 0.979267, Elapsed Time = 5.587740s
   ubuntu@ip-172-31-21-218:~/incubator-singa/examples/autograd$ 
/home/ubuntu/mpich-3.3/build/bin/mpiexec --hostfile host_file python3 
mnist_dist.py
   Starting Epoch 0:
   Training loss = 781.167480, training accuracy = 0.719017
   Evaluation accuracy = 0.918586, Elapsed Time = 1.255623s
   Starting Epoch 1:
   Training loss = 259.223297, training accuracy = 0.912276
   Evaluation accuracy = 0.950863, Elapsed Time = 1.216926s
   Starting Epoch 2:
   Training loss = 179.333084, training accuracy = 0.940605
   Evaluation accuracy = 0.968030, Elapsed Time = 1.206751s
   Starting Epoch 3:
   Training loss = 137.840988, training accuracy = 0.954243
   Evaluation accuracy = 0.975946, Elapsed Time = 1.202503s
   Starting Epoch 4:
   Training loss = 119.743629, training accuracy = 0.959836
   Evaluation accuracy = 0.973581, Elapsed Time = 1.208274s
   Starting Epoch 5:
   Training loss = 102.545876, training accuracy = 0.965595
   Evaluation accuracy = 0.980572, Elapsed Time = 1.205539s
   Starting Epoch 6:
   Training loss = 93.249054, training accuracy = 0.969401
   Evaluation accuracy = 0.978207, Elapsed Time = 1.203708s
   Starting Epoch 7:
   Training loss = 84.655556, training accuracy = 0.971104
   Evaluation accuracy = 0.980777, Elapsed Time = 1.206410s
   Starting Epoch 8:
   Training loss = 77.996643, training accuracy = 0.973691
   Evaluation accuracy = 0.985609, Elapsed Time = 1.207295s
   Starting Epoch 9:
   Training loss = 75.888077, training accuracy = 0.974442
   Evaluation accuracy = 0.982319, Elapsed Time = 1.203693s
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

Reply via email to