Davdi opened a new issue #13526: distributed training van.cc Check failed URL: https://github.com/apache/incubator-mxnet/issues/13526 ## Description i used this shell to test distributed training but it shows this error ../../tools/launch.py -n 2 -H hosts --launcher ssh python image_classification.py --dataset cifar10 --model vgg11 epochs 1 --kvstore dist_sync _init_kvstore_server_module() File "/usr/local/lib/python3.6/dist-packages/mxnet/kvstore_server.py", line 82, in _init_kvstore_server_module server.run() File "/usr/local/lib/python3.6/dist-packages/mxnet/kvstore_server.py", line 73, in run check_call(_LIB.MXKVStoreRunServer(self.handle, _ctrl_proto(self._controller()), None)) File "/usr/local/lib/python3.6/dist-packages/mxnet/base.py", line 252, in check_call raise MXNetError(py_str(_LIB.MXGetLastError())) mxnet.base.MXNetError: [08:18:33] src/van.cc:291: Check failed: (my_node_.port) != (-1) bind failed ## Environment info (Required) unbutu 16.04 1 ps 1 worker(2 gpu) ``` Package used (Python/R/Scala/Julia): (I'm using ...) For Scala user, please provide: 1. Java version: (`java -version`) 2. Maven version: (`mvn -version`) 3. Scala runtime if applicable: (`scala -version`) For R user, please provide R `sessionInfo()`: ## Build info (Required if built from source) Compiler (gcc/clang/mingw/visual studio): MXNet commit hash: (Paste the output of `git rev-parse HEAD` here.) Build config: (Paste the content of config.mk, or the build command.) ## Error Message: (Paste the complete error message, including stack trace.) ## Minimum reproducible example (If you are using your own code, please provide a short script that reproduces the error. Otherwise, please provide link to the existing example.) ## Steps to reproduce (Paste the commands you ran that produced the error.) 1.git clone incubator-mxnet and git clone dmlc-core 2.cd dmlc-core && make 3 cd incubator-mxnet/example/gluon/ ../../tools/launch.py -n 2 -H hosts --launcher ssh python image_classification.py --dataset cifar10 --model vgg11 epochs 1 --kvstore dist_sync
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services