Dave,

here is what I found

 - MPI_THREAD_MULTIPLE is not part of the equation (I just found it is
no more required by IMB by default)
 - patcher/overwrite is not built when Open MPI is configure'd with
--disable-dlopen
 - when configure'd without --disable-dlopen, performances are way
worst for the IMB (PingPong) benchmark when ran with
   mpirun --mca patcher ^overwrite
 - OSU (osu_bw) performances are not impacted by the patcher/overwrite
component being blacklisted

I am afraid that's all I can do ...


Nathan,

could you please shed some light ?


Cheers,

Gilles

On Wed, Jan 24, 2018 at 1:29 PM, Gilles Gouaillardet
<gilles.gouaillar...@gmail.com> wrote:
> Dave,
>
> i can reproduce the issue with btl/openib and the IMB benchmark, that
> is known to MPI_Init_thread(MPI_THREAD_MULTIPLE)
>
> note performance is ok with OSU benchmark that does not require
> MPI_THREAD_MULTIPLE
>
> Cheers,
>
> Gilles
>
> On Wed, Jan 24, 2018 at 1:16 PM, Gilles Gouaillardet <gil...@rist.or.jp> 
> wrote:
>> Dave,
>>
>>
>> one more question, are you running the openib/btl ? or other libraries such
>> as MXM or UCX ?
>>
>>
>> Cheers,
>>
>>
>> Gilles
>>
>>
>> On 1/24/2018 12:55 PM, Dave Turner wrote:
>>>
>>>
>>>    We compiled OpenMPI 2.1.1 using the EasyBuild configuration
>>> for CentOS as below and tested on Mellanox QDR hardware.
>>>
>>> ./configure --prefix=/homes/daveturner/libs/openmpi-2.1.1c
>>>                  --enable-shared
>>>                  --enable-mpi-thread-multiple
>>>                  --with-verbs
>>>                  --enable-mpirun-prefix-by-default
>>>                  --with-mpi-cxx
>>>                  --enable-mpi-cxx
>>>                  --with-hwloc=$EBROOTHWLOC
>>>                  --disable-dlopen
>>>
>>> The red curve in the attached NetPIPE graph shows the poor performance
>>> above
>>> 8 kB for the uni-directional tests with bi-directional and aggregate
>>> tests also showing similar problems.  When I compile using the same
>>> configuration but with the --disable-dlopen parameter removed then the
>>> performance is very good as the green curve in the graph shows.
>>>
>>> We see the same problems with OpenMPI 2.0.2.
>>> Replacing --disable-dlopen with --disable-mca-dso showed good performance.
>>> Replacing --disable-dlopen with --enable-static showed good performance.
>>> So it's only --disable-dlopen that leads to poor performance.
>>>
>>> http://netpipe.cs.ksu.edu
>>>
>>>                    Dave Turner
>>>
>>> --
>>> Work: davetur...@ksu.edu <mailto:davetur...@ksu.edu>     (785) 532-7791
>>>              2219 Engineering Hall, Manhattan KS  66506
>>> Home: drdavetur...@gmail.com <mailto:drdavetur...@gmail.com>
>>>               cell: (785) 770-5929
>>>
>>>
>>> _______________________________________________
>>> devel mailing list
>>> devel@lists.open-mpi.org
>>> https://lists.open-mpi.org/mailman/listinfo/devel
>>
>>
>> _______________________________________________
>> devel mailing list
>> devel@lists.open-mpi.org
>> https://lists.open-mpi.org/mailman/listinfo/devel
_______________________________________________
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel

Reply via email to