cjolivier01 commented on issue #8751: Distributed Training has inverse results 
when imported (8 GPUS is slower than 1!)
URL: 
https://github.com/apache/incubator-mxnet/issues/8751#issuecomment-346671211
 
 
   2:
     (3): MaxPool2D(size=(2, 2), stride=(2, 2), padding=(0, 0), ceil_mode=False)
     (4): Flatten
     (5): Dense(None -> 128, Activation(relu))
     (6): Dense(None -> 10, linear)
   )
   [EXTRACT]:      loss from file_3
   [EXTRACTED]:    loss:   SoftmaxCrossEntropyLoss(batch_axis=0, w=None)
   [EXTRACT]:      gpus from file_3
   [ERROR]:        gpus could not be imported from file_3
   [TRY]:  looking for default value for gpus
   [EXTRACTED]:    gpus:   2
   [EXTRACT]:      epochs from file_3
   [EXTRACTED]:    epochs: 5
   [EXTRACT]:      batch_size from file_3
   [EXTRACTED]:    batch_size:     64
   [EXTRACT]:      learning_rate from file_3
   [EXTRACTED]:    learning_rate:  0.001
   [EXTRACT]:      training_method from file_3
   [EXTRACTED]:    training_method:        sgd
   [EXTRACT]:      seed from file_3
   [ERROR]:        seed could not be imported from file_3
   [TRY]:  looking for default value for seed
   [EXTRACTED]:    seed:   3
   [EXTRACT]:      training_data from file_3
   [EXTRACTED]:    training_data:  <mxnet.io.NDArrayIter object at 
0x7fc1605ec828>
   [EXTRACT]:      test_data from file_3
   [EXTRACTED]:    test_data:      <mxnet.io.NDArrayIter object at 
0x7fc1605ec9b0>
   [IMPORT]:       importing dependencies
   ______________________________
   [GO]:   Parallel Run
   Running on 2 gpus
   [gpu(0), gpu(1)]
   [INIT]: net parameters
   [INIT]: trainer
   Epoch 0, training time = 6.8 sec
                   Validation Accuracy = 0.0000
   Epoch 1, training time = 6.0 sec
                   Validation Accuracy = 0.0000
   Epoch 2, training time = 6.1 sec
                   Validation Accuracy = 0.0000
   Epoch 3, training time = 6.1 sec
                   Validation Accuracy = 0.0000
   Epoch 4, training time = 6.3 sec
                   Validation Accuracy = 0.0000
   [END]:  Parallel Run
   ______________________________
   
   Process finished with exit code 0
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

Reply via email to