absalama commented on issue #11855: Distributed learning with Async update does not work. URL: https://github.com/apache/incubator-mxnet/issues/11855#issuecomment-408541838 In trainer.py I changed the default value of **update_on_kvstore** from None to True in the __init__ method. The imageclassification.py initialises the trainer object so I assume that if the **update_on_kvstore** is True in the __init__ method then it should be set? We are using slurm cluster, and all nodes are sharing the same folder where mxnet source resides. So the any change should be seen by all worker. Do I need to set something extra for slurm configuration?
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
