Ishitori commented on issue #13644: aws distributed training struck !
URL: 
https://github.com/apache/incubator-mxnet/issues/13644#issuecomment-464405780
 
 
   @Davdi, did you manage to find the issue? To me it seems that it should be 
connected to connectivity, because when you remove `--kvstore dist_sync` the 
default value is used, which for this tutorial is `device`. 
   
   I also notice that you set n = 1, but mention "2 instances". Which 
configuration do you try to achieve? And what is the content of your hosts file?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to