first of all, check 
out:https://github.com/apache/incubator-mxnet/tree/master/example/distributed_training
install anaconda on all your machines
run "pip install mxnet-cu80==1.2.1" on your all machines (or pip install 
mxnet-cu90 depends on your machine's env)
make all your machines sshable without input password between each other 
machine A to machine B
cd  ~/.ssh
ssh-keygen  -t  rsa
two files are generated:  id_rsa is the secret key;and  id_rsa.pub is the 
public key
in Machine B: vim ~/.ssh/authorized_keys, copy the contents in machine 
A:~/.ssh/id_rsa.pub here
do the same things to make machine B to machine A sshable without password
also, machine A to Machine A, machine B to machine B is needed
then run 
 python 
/home/xiaomin.wu/anaconda2/lib/python2.7/site-packages/mxnet/tools/launch.py -n 
2 -s 2 -H hosts --sync-dst-dir /home/xiaomin.wu/cifar10_dist --launcher ssh  
"/home/xiaomin.wu/anaconda2/bin/python cifar10_dist.py" 
here we use /home/xiaomin.wu/anaconda2/bin/python instead of python, because if 
we just ues python here the machines may use /usr/bin/pythob, which will get 
you crazy.

[ Full content available at: 
https://github.com/apache/incubator-mxnet/issues/12363 ]
This message was relayed via gitbox.apache.org for [email protected]

Reply via email to