first of all, check out:https://github.com/apache/incubator-mxnet/tree/master/example/distributed_training install anaconda on all your machines run "pip install mxnet-cu80==1.2.1" on your all machines (or pip install mxnet-cu90 depends on your machine's env) make all your machines sshable without input password between each other machine A to machine B cd ~/.ssh ssh-keygen -t rsa two files are generated: id_rsa is the secret key;and id_rsa.pub is the public key in Machine B: vim ~/.ssh/authorized_keys, copy the contents in machine A:~/.ssh/id_rsa.pub here do the same things to make machine B to machine A sshable without password also, machine A to Machine A, machine B to machine B is needed then run python /home/xiaomin.wu/anaconda2/lib/python2.7/site-packages/mxnet/tools/launch.py -n 2 -s 2 -H hosts --sync-dst-dir /home/xiaomin.wu/cifar10_dist --launcher ssh "/home/xiaomin.wu/anaconda2/bin/python cifar10_dist.py" here we use /home/xiaomin.wu/anaconda2/bin/python instead of python, because if we just ues python here the machines may use /usr/bin/pythob, which will get you crazy.
[ Full content available at: https://github.com/apache/incubator-mxnet/issues/12363 ] This message was relayed via gitbox.apache.org for [email protected]
