threeleafzerg commented on issue #10696: [MXNET-366]Extend MXNet Distributed 
Training by MPI AllReduce
URL: https://github.com/apache/incubator-mxnet/pull/10696#issuecomment-386484809
 
 
   @rahul003 
   The build instruction is in the design doc. 
   USE_DIST_KVSTORE = 1
   USE_MPI_DIST_KVSTORE = 1
    MPI_ROOT=/usr/lib/openmpi
   We let the end user to select which mpi to use. (openmpi, mpich, or intel 
mpi.) That's why we don't include src as 3rd party lib.  You can check horovod, 
they play the same trick.  https://github.com/uber/horovod#install
   So the end user need to install MPI separately.
   Can you try latest open mpi?   We tried both open mpi and intel mpi, their 
release dir structure looks like following:
   /home/zhouhaiy/openmpi/build
   [zhouhaiy@mlt-ace build]$ ls
   bin  etc  include  lib  share
   Looks like mpich release dir is not same as open mpi, I will have a check. 
   
   Certainly, we can also do the following logic: 
   If env MPI_ROOT is set, we use this mpi lib version from this env, otherwise 
we download open source 3rd party mpi source code, compile build and mxnet 
depends upon it. 
   
   Which one do you prefer? Need consensus.
   
   
   
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to