jokerwenxiao opened a new issue #14763: how to run distributed mxnet training job by docker container URL: https://github.com/apache/incubator-mxnet/issues/14763 I tried to run a distributed training job with docker container. I used the launch.py to run the examples/distributed_training/cifar10_dist.py in container, but i confused with hosts file. command is as flow: 1. **run container in two hosts:** docker run -ti --net=host image:name bash 2. **command in shell:** python launch.py -n -2 -s 2 --sync-dst-dir examples/distributed_training/ --launch ssh -H hosts "python examples/distributed_training/cifar10_dist.py" 3.**hosts file content:** host_ip1 host_ip2 **than prompt for a password, The form is as follows:** root@host_Ip1's password:root@host_Ip1's password:root@host_Ip2's password:root@host_Ip2's password: I entered the appropriate host password, but the password check failed
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
