NanYan1119 commented on issue #10696: [MXNET-366]Extend MXNet Distributed Training by AllReduce URL: https://github.com/apache/incubator-mxnet/pull/10696#issuecomment-419826207 I have a problem when test the allreduce version. 1) I clone the master code from https://github.com/threeleafzerg/incubator-mxnet. 2) I use "make -j $(nproc) USE_OPENCV=1 USE_BLAS=openblas USE_CUDA=1 USE_CUDA_PATH=/usr/local/cuda USE_CUDNN=1 USE_PROFILER=1 USE_DIST_KVSTORE=1 USE_ALLREDUCE_DIST_KVSTORE=1 MPI_ROOT=/usr/local/openmpi" to compile. 3) But when I run the script ~/incubator-mxnet/tests/nightly/dist_allreduce_sync_kvstore.py, it end with an error: pure virtual method called terminate called without an active exception *** Process received signal *** Signal: Aborted (6) Signal code: (-6) It seems the error occurred in destruct the kvstore, can anyone help me? @threeleafzerg
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
