Just more information to help with the decision:

I am working on Nathan’s uct btl to make it work with ob1 and infiniband. So 
this could be a replacement for openib and honestly we should totally call this 
new uct btl Kylo. 

Arm

> On Apr 5, 2018, at 1:37 PM, Jeff Squyres (jsquyres) <jsquy...@cisco.com> 
> wrote:
> 
> Below is an email exchange from the users mailing list.
> 
> I'm moving this over to devel to talk among the developer community.
> 
> Multiple times recently on the users list, we've told people with problems 
> with the openib BTL that they should be using UCX (per Mellanox's 
> publicly-stated support positions).
> 
> Is it time to deprecate / print warning messages / remove the openib BTL?
> 
> 
> 
>> Begin forwarded message:
>> 
>> From: Nathan Hjelm <hje...@me.com <mailto:hje...@me.com>>
>> Subject: Re: [OMPI users] Eager RDMA causing slow osu_bibw with 3.0.0
>> Date: April 5, 2018 at 12:48:08 PM EDT
>> To: Open MPI Users <us...@lists.open-mpi.org 
>> <mailto:us...@lists.open-mpi.org>>
>> Cc: Open MPI Users <us...@lists.open-mpi.org 
>> <mailto:us...@lists.open-mpi.org>>
>> Reply-To: Open MPI Users <us...@lists.open-mpi.org 
>> <mailto:us...@lists.open-mpi.org>>
>> 
>> 
>> Honestly, this is a configuration issue with the openib btl. There is no 
>> reason to keep either eager RDMA nor is there a reason to pipeline RDMA. I 
>> haven't found an app where either of these "features" helps you with 
>> infiniband. You have the right idea with the parameter changes but Howard is 
>> correct, for Mellanox the future is UCX not verbs. I would try it and see if 
>> it works for you but if it doesn't I would set those two parameters in your 
>> /etc/openmpi-mca-params.conf and run like that.
>> 
>> -Nathan
>> 
>> On Apr 05, 2018, at 01:18 AM, Ben Menadue <ben.mena...@nci.org.au 
>> <mailto:ben.mena...@nci.org.au>> wrote:
>> 
>>> Hi,
>>> 
>>> Another interesting point. I noticed that the last two message sizes tested 
>>> (2MB and 4MB) are lower than expected for both osu_bw and osu_bibw. 
>>> Increasing the minimum size to use the RDMA pipeline to above these sizes 
>>> brings those two data-points up to scratch for both benchmarks:
>>> 
>>> 3.0.0, osu_bw, no rdma for large messages
>>> 
>>> > mpirun -mca btl_openib_min_rdma_pipeline_size 4194304 -map-by ppr:1:node 
>>> > -np 2 -H r6,r7 ./osu_bw -m 2097152:4194304
>>> # OSU MPI Bi-Directional Bandwidth Test v5.4.0
>>> # Size      Bandwidth (MB/s)
>>> 2097152              6133.22
>>> 4194304              6054.06
>>> 
>>> 3.0.0, osu_bibw, eager rdma disabled, no rdma for large messages
>>> 
>>> > mpirun -mca btl_openib_min_rdma_pipeline_size 4194304 -mca 
>>> > btl_openib_use_eager_rdma 0 -map-by ppr:1:node -np 2 -H r6,r7 ./osu_bibw 
>>> > -m 2097152:4194304
>>> # OSU MPI Bi-Directional Bandwidth Test v5.4.0
>>> # Size      Bandwidth (MB/s)
>>> 2097152             11397.85
>>> 4194304             11389.64
>>> 
>>> This makes me think something odd is going on in the RDMA pipeline.
>>> 
>>> Cheers,
>>> Ben
>>> 
>>> 
>>> 
>>>> On 5 Apr 2018, at 5:03 pm, Ben Menadue <ben.mena...@nci.org.au 
>>>> <mailto:ben.mena...@nci.org.au>> wrote:
>>>> Hi,
>>>> 
>>>> We’ve just been running some OSU benchmarks with OpenMPI 3.0.0 and noticed 
>>>> that osu_bibw gives nowhere near the bandwidth I’d expect (this is on FDR 
>>>> IB). However, osu_bw is fine.
>>>> 
>>>> If I disable eager RDMA, then osu_bibw gives the expected numbers. 
>>>> Similarly, if I increase the number of eager RDMA buffers, it gives the 
>>>> expected results.
>>>> 
>>>> OpenMPI 1.10.7 gives consistent, reasonable numbers with default settings, 
>>>> but they’re not as good as 3.0.0 (when tuned) for large buffers. The same 
>>>> option changes produce no different in the performance for 1.10.7.
>>>> 
>>>> I was wondering if anyone else has noticed anything similar, and if this 
>>>> is unexpected, if anyone has a suggestion on how to investigate further?
>>>> 
>>>> Thanks,
>>>> Ben
>>>> 
>>>> 
>>>> Here’s are the numbers:
>>>> 
>>>> 3.0.0, osu_bw, default settings
>>>> 
>>>> > mpirun -map-by ppr:1:node -np 2 -H r6,r7 ./osu_bw
>>>> # OSU MPI Bandwidth Test v5.4.0
>>>> # Size      Bandwidth (MB/s)
>>>> 1                       1.13
>>>> 2                       2.29
>>>> 4                       4.63
>>>> 8                       9.21
>>>> 16                     18.18
>>>> 32                     36.46
>>>> 64                     69.95
>>>> 128                   128.55
>>>> 256                   250.74
>>>> 512                   451.54
>>>> 1024                  829.44
>>>> 2048                 1475.87
>>>> 4096                 2119.99
>>>> 8192                 3452.37
>>>> 16384                2866.51
>>>> 32768                4048.17
>>>> 65536                5030.54
>>>> 131072               5573.81
>>>> 262144               5861.61
>>>> 524288               6015.15
>>>> 1048576              6099.46
>>>> 2097152               989.82
>>>> 4194304               989.81
>>>> 
>>>> 3.0.0, osu_bibw, default settings
>>>> 
>>>> > mpirun -map-by ppr:1:node -np 2 -H r6,r7 ./osu_bibw
>>>> # OSU MPI Bi-Directional Bandwidth Test v5.4.0
>>>> # Size      Bandwidth (MB/s)
>>>> 1                       0.00
>>>> 2                       0.01
>>>> 4                       0.01
>>>> 8                       0.02
>>>> 16                      0.04
>>>> 32                      0.09
>>>> 64                      0.16
>>>> 128                   135.30
>>>> 256                   265.35
>>>> 512                   499.92
>>>> 1024                  949.22
>>>> 2048                 1440.27
>>>> 4096                 1960.09
>>>> 8192                 3166.97
>>>> 16384                 127.62
>>>> 32768                 165.12
>>>> 65536                 312.80
>>>> 131072               1120.03
>>>> 262144               4724.01
>>>> 524288               4545.93
>>>> 1048576              5186.51
>>>> 2097152               989.84
>>>> 4194304               989.88
>>>> 
>>>> 3.0.0, osu_bibw, eager RDMA disabled
>>>> 
>>>> > mpirun -mca btl_openib_use_eager_rdma 0 -map-by ppr:1:node -np 2 -H 
>>>> > r6,r7 ./osu_bibw
>>>> # OSU MPI Bi-Directional Bandwidth Test v5.4.0
>>>> # Size      Bandwidth (MB/s)
>>>> 1                       1.49
>>>> 2                       2.97
>>>> 4                       5.96
>>>> 8                      11.98
>>>> 16                     23.95
>>>> 32                     47.39
>>>> 64                     93.57
>>>> 128                   153.82
>>>> 256                   304.69
>>>> 512                   572.30
>>>> 1024                 1003.52
>>>> 2048                 1083.89
>>>> 4096                 1879.32
>>>> 8192                 2785.18
>>>> 16384                3535.77
>>>> 32768                5614.72
>>>> 65536                8113.69
>>>> 131072               9666.74
>>>> 262144              10738.97
>>>> 524288              11247.02
>>>> 1048576             11416.50
>>>> 2097152               989.88
>>>> 4194304               989.88
>>>> 
>>>> 3.0.0, osu_bibw, increased eager RDMA buffer count
>>>> 
>>>> > mpirun -mca btl_openib_eager_rdma_num 32768 -map-by ppr:1:node -np 2 -H 
>>>> > r6,r7 ./osu_bibw
>>>> # OSU MPI Bi-Directional Bandwidth Test v5.4.0
>>>> # Size      Bandwidth (MB/s)
>>>> 1                       1.42
>>>> 2                       2.84
>>>> 4                       5.67
>>>> 8                      11.18
>>>> 16                     22.46
>>>> 32                     44.65
>>>> 64                     83.10
>>>> 128                   154.00
>>>> 256                   291.63
>>>> 512                   537.66
>>>> 1024                  942.35
>>>> 2048                 1433.09
>>>> 4096                 2356.40
>>>> 8192                 1998.54
>>>> 16384                3584.82
>>>> 32768                5523.08
>>>> 65536                7717.63
>>>> 131072               9419.50
>>>> 262144              10564.77
>>>> 524288              11104.71
>>>> 1048576             11130.75
>>>> 2097152              7943.89
>>>> 4194304              5270.00
>>>> 
>>>> 1.10.7, osu_bibw, default settings
>>>> 
>>>> > mpirun -map-by ppr:1:node -np 2 -H r6,r7 ./osu_bibw
>>>> # OSU MPI Bi-Directional Bandwidth Test v5.4.0
>>>> # Size      Bandwidth (MB/s)
>>>> 1                       1.70
>>>> 2                       3.45
>>>> 4                       6.95
>>>> 8                      13.68
>>>> 16                     27.41
>>>> 32                     53.80
>>>> 64                    105.34
>>>> 128                   164.40
>>>> 256                   324.63
>>>> 512                   623.95
>>>> 1024                 1127.35
>>>> 2048                 1784.58
>>>> 4096                 3305.45
>>>> 8192                 3697.55
>>>> 16384                4935.75
>>>> 32768                7186.28
>>>> 65536                8996.94
>>>> 131072               9301.78
>>>> 262144               4691.36
>>>> 524288               7039.18
>>>> 1048576              7213.33
>>>> 2097152              9601.41
>>>> 4194304              9281.31
>>>> 
>>>> 
>>> 
>>> _______________________________________________
>>> users mailing list
>>> us...@lists.open-mpi.org <mailto:us...@lists.open-mpi.org>
>>> https://lists.open-mpi.org/mailman/listinfo/users 
>>> <https://lists.open-mpi.org/mailman/listinfo/users>_______________________________________________
>> users mailing list
>> us...@lists.open-mpi.org <mailto:us...@lists.open-mpi.org>
>> https://lists.open-mpi.org/mailman/listinfo/users
> 
> 
> -- 
> Jeff Squyres
> jsquy...@cisco.com <mailto:jsquy...@cisco.com>
> _______________________________________________
> devel mailing list
> devel@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/devel

_______________________________________________
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel

Reply via email to