Dave, here is what I found
- MPI_THREAD_MULTIPLE is not part of the equation (I just found it is no more required by IMB by default) - patcher/overwrite is not built when Open MPI is configure'd with --disable-dlopen - when configure'd without --disable-dlopen, performances are way worst for the IMB (PingPong) benchmark when ran with mpirun --mca patcher ^overwrite - OSU (osu_bw) performances are not impacted by the patcher/overwrite component being blacklisted I am afraid that's all I can do ... Nathan, could you please shed some light ? Cheers, Gilles On Wed, Jan 24, 2018 at 1:29 PM, Gilles Gouaillardet <gilles.gouaillar...@gmail.com> wrote: > Dave, > > i can reproduce the issue with btl/openib and the IMB benchmark, that > is known to MPI_Init_thread(MPI_THREAD_MULTIPLE) > > note performance is ok with OSU benchmark that does not require > MPI_THREAD_MULTIPLE > > Cheers, > > Gilles > > On Wed, Jan 24, 2018 at 1:16 PM, Gilles Gouaillardet <gil...@rist.or.jp> > wrote: >> Dave, >> >> >> one more question, are you running the openib/btl ? or other libraries such >> as MXM or UCX ? >> >> >> Cheers, >> >> >> Gilles >> >> >> On 1/24/2018 12:55 PM, Dave Turner wrote: >>> >>> >>> We compiled OpenMPI 2.1.1 using the EasyBuild configuration >>> for CentOS as below and tested on Mellanox QDR hardware. >>> >>> ./configure --prefix=/homes/daveturner/libs/openmpi-2.1.1c >>> --enable-shared >>> --enable-mpi-thread-multiple >>> --with-verbs >>> --enable-mpirun-prefix-by-default >>> --with-mpi-cxx >>> --enable-mpi-cxx >>> --with-hwloc=$EBROOTHWLOC >>> --disable-dlopen >>> >>> The red curve in the attached NetPIPE graph shows the poor performance >>> above >>> 8 kB for the uni-directional tests with bi-directional and aggregate >>> tests also showing similar problems. When I compile using the same >>> configuration but with the --disable-dlopen parameter removed then the >>> performance is very good as the green curve in the graph shows. >>> >>> We see the same problems with OpenMPI 2.0.2. >>> Replacing --disable-dlopen with --disable-mca-dso showed good performance. >>> Replacing --disable-dlopen with --enable-static showed good performance. >>> So it's only --disable-dlopen that leads to poor performance. >>> >>> http://netpipe.cs.ksu.edu >>> >>> Dave Turner >>> >>> -- >>> Work: davetur...@ksu.edu <mailto:davetur...@ksu.edu> (785) 532-7791 >>> 2219 Engineering Hall, Manhattan KS 66506 >>> Home: drdavetur...@gmail.com <mailto:drdavetur...@gmail.com> >>> cell: (785) 770-5929 >>> >>> >>> _______________________________________________ >>> devel mailing list >>> devel@lists.open-mpi.org >>> https://lists.open-mpi.org/mailman/listinfo/devel >> >> >> _______________________________________________ >> devel mailing list >> devel@lists.open-mpi.org >> https://lists.open-mpi.org/mailman/listinfo/devel _______________________________________________ devel mailing list devel@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/devel