waytrue17 opened a new issue #19556: URL: https://github.com/apache/incubator-mxnet/issues/19556
## Description Running mxnet-horovod example `incubator-mxnet/example/distributed_training-horovod/gluon_mnist.py` on mxnet1.8-cuda11.0 with python 3.7 encountered a segfault error. The error occurred after the example script finished. The same script works fine on mxnet1.8-cuda10.2 with python 3.7 and mxnet1.8-cuda11.0 with python 3.6. ## To Reproduce ### Steps to reproduce 1. Launch an EC2 p3.8x gpu instance with dlami: ami-02440419a5afe47ab 2. Build mx1.8-cu110 from source 3. Install Horovod `python3 -m pip install horovod` 4. Run `LD_LIBRARY_PATH=/usr/local/cuda-11.0/lib64:$LD_LIBRARY_PATH python3 \ incubator-mxnet/example/distributed_training-horovod/gluon_mnist.py` to reproduce the error ## What have you tried to solve it? 1. Backport #19378 to v1.8.x solved the issue ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
