threeleafzerg commented on issue #10696: [MXNET-366]Extend MXNet Distributed Training by MPI AllReduce URL: https://github.com/apache/incubator-mxnet/pull/10696#issuecomment-386484809 @rahul003 The build instruction is in the design doc. USE_DIST_KVSTORE = 1 USE_MPI_DIST_KVSTORE = 1 MPI_ROOT=/usr/lib/openmpi We let the end user to select which mpi to use. (openmpi, mpich, or intel mpi.) That's why we don't include src as 3rd party lib. You can check horovod, they play the same trick. https://github.com/uber/horovod#install So the end user need to install MPI separately. Can you try latest open mpi? We tried both open mpi and intel mpi, their release dir structure looks like following: /home/zhouhaiy/openmpi/build [zhouhaiy@mlt-ace build]$ ls bin etc include lib share Looks like mpich release dir is not same as open mpi, I will have a check. Certainly, we can also do the following logic: If env MPI_ROOT is set, we use this mpi lib version from this env, otherwise we download open source 3rd party mpi source code, compile build and mxnet depends upon it. Which one do you prefer? Need consensus.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
