sandeep-krishnamurthy commented on issue #11011: ../../tools/launch.py -n 2 -s 2 --launcher yarn python train_mnist.py --network lenet --kv-store dist_sync URL: https://github.com/apache/incubator-mxnet/issues/11011#issuecomment-391759054 Hello @liuzx32 - Here is an example of using distributed training with Yarn - https://dzone.com/articles/running-mxnet-on-hadoop-yarn If you are open with SSH based distributed training. Here is a very good example using AWS CloudFormation template - https://github.com/awslabs/deeplearning-cfn#running-distributed-training-on-mxnet Also, it would be great if you could contribute a tutorial for using MXNet with Yarn.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
