absalama commented on issue #11855: Distributed learning with Async update does 
not work.
URL: 
https://github.com/apache/incubator-mxnet/issues/11855#issuecomment-408541838
 
 
   In trainer.py I changed the default value of **update_on_kvstore** from None 
to True in the __init__ method. The imageclassification.py initialises the 
trainer object so I assume that if the **update_on_kvstore** is True in the 
__init__ method then it should be set?
   
   We are using slurm cluster, and all nodes are sharing the same folder where 
mxnet source resides. So the any change should be seen by all worker. 
   
   Do I need to set something extra for slurm configuration?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to