[GitHub] cjolivier01 commented on issue #8751: Distributed Training has inverse results when imported (8 GPUS is slower than 1!)

GitBox Thu, 23 Nov 2017 09:22:29 -0800

cjolivier01 commented on issue #8751: Distributed Training has inverse results 
when imported (8 GPUS is slower than 1!)
URL: 
https://github.com/apache/incubator-mxnet/issues/8751#issuecomment-346670273
 
 
   I have twp GTX 1080's on my home machine here, and I get roughly the same 
speed on two GPUs:
   
   file_1.py
   
   
   /usr/bin/python2.7 /home/coolivie/src/DeepLearning/python/file_1.py --gpus 2
   Defining network
   Sequential(
     (0): Conv2D(None -> 20, kernel_size=(3, 3), stride=(1, 1))
     (1): MaxPool2D(size=(2, 2), stride=(2, 2), padding=(0, 0), ceil_mode=False)
     (2): Conv2D(None -> 50, kernel_size=(5, 5), stride=(1, 1))
     (3): MaxPool2D(size=(2, 2), stride=(2, 2), padding=(0, 0), ceil_mode=False)
     (4): Flatten
     (5): Dense(None -> 128, Activation(relu))
     (6): Dense(None -> 10, linear)
   )
   running with 2 GPUs
   Running on [gpu(0), gpu(1)]
   loading mnist
   mnist loaded
   Batch size is 128
   initalizing parameters
   initalizing trainer
   Epoch 0, training time = 6.3 sec
                   Validation accuracy = 0.9525
   Epoch 1, training time = 5.9 sec
                   Validation accuracy = 0.9792
   Epoch 2, training time = 6.0 sec
                   Validation accuracy = 0.9815
   Epoch 3, training time = 5.9 sec
                   Validation accuracy = 0.9838
   Epoch 4, training time = 5.9 sec
                   Validation accuracy = 0.9825
   
   
   
   
   file_2.py
   
   [GO]:   Parallel Run
   Running on 2 gpus
   [gpu(0), gpu(1)]
   [INIT]: net parameters
   [INIT]: trainer
   Epoch 0, training time = 6.8 sec
                   Validation Accuracy = 0.0000
   Epoch 1, training time = 6.0 sec
                   Validation Accuracy = 0.0000
   Epoch 2, training time = 6.0 sec
                   Validation Accuracy = 0.0000
   Epoch 3, training time = 6.0 sec
                   Validation Accuracy = 0.0000
   Epoch 4, training time = 6.4 sec
                   Validation Accuracy = 0.0000
   Epoch 5, training time = 6.5 sec
                   Validation Accuracy = 0.0000


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] cjolivier01 commented on issue #8751: Distributed Training has inverse results when imported (8 GPUS is slower than 1!)

Reply via email to