larroy edited a comment on issue #12994: Test failure and possible bug on GPU 
topology algorithm (test_device.test_device_pushpull)
URL: 
https://github.com/apache/incubator-mxnet/issues/12994#issuecomment-439996798
 
 
   I found out the root cause of this, we are unable to perform graph 
partitioning on 8 GPUs such as p3.16xlarge or DGX1 when using tree in kvstore. 
We need to fix graph partitioning and add a regression test.
   
   In this case K-L fails finding a graph partition and the fallback using BFS 
doesn't seem to work (and is not currently well unit tested).
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to