[GitHub] thomelane commented on issue #12577: Training with fc and multi-gpu is much slower than single gpu

GitBox Tue, 02 Oct 2018 15:35:49 -0700

thomelane commented on issue #12577: Training with fc and multi-gpu is much 
slower than single gpu
URL: 
https://github.com/apache/incubator-mxnet/issues/12577#issuecomment-426453274
 
 
   Hi @liu6381810,
   
   Seems like there's an overhead to using multiple GPUs, and one possible 
source is the transfer of gradients between GPUs. Are you using an AWS EC2 
p3.16xlarge instance for this, or do you have your own server here? Check 
`nvidia-smi topo --matrix` to confirm that you have fast GPU to GPU 
communications. You could also take a look at gradient compression to reduce 
the amount of data being transferred: see [this 
tutorial](https://mxnet.incubator.apache.org/faq/gradient_compression.html?highlight=compression)
 for more information.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] thomelane commented on issue #12577: Training with fc and multi-gpu is much slower than single gpu

Reply via email to