That would be sweet.  Are you aiming for v4.0.0, perchance?

I.e., should we have "remove openib BTL / add uct BTL" to the target feature 
list for v4.0.0?

(I did laugh at the Kylo name, but I really, really don't want to keep the 
propagate the idea of meaningless names for BTLs -- especially since BTL names, 
more so than most other component names, are visible to the user.  ...unless 
someone wants to finally finish the ideas and implement the whole 
network-transport-name system that we've talked about for a few years... :-) )


> On Apr 5, 2018, at 1:49 PM, Thananon Patinyasakdikul <tpati...@vols.utk.edu> 
> wrote:
> 
> Just more information to help with the decision:
> 
> I am working on Nathan’s uct btl to make it work with ob1 and infiniband. So 
> this could be a replacement for openib and honestly we should totally call 
> this new uct btl Kylo. 
> 
> Arm
> 
>> On Apr 5, 2018, at 1:37 PM, Jeff Squyres (jsquyres) <jsquy...@cisco.com> 
>> wrote:
>> 
>> Below is an email exchange from the users mailing list.
>> 
>> I'm moving this over to devel to talk among the developer community.
>> 
>> Multiple times recently on the users list, we've told people with problems 
>> with the openib BTL that they should be using UCX (per Mellanox's 
>> publicly-stated support positions).
>> 
>> Is it time to deprecate / print warning messages / remove the openib BTL?
>> 
>> 
>> 
>>> Begin forwarded message:
>>> 
>>> From: Nathan Hjelm <hje...@me.com>
>>> Subject: Re: [OMPI users] Eager RDMA causing slow osu_bibw with 3.0.0
>>> Date: April 5, 2018 at 12:48:08 PM EDT
>>> To: Open MPI Users <us...@lists.open-mpi.org>
>>> Cc: Open MPI Users <us...@lists.open-mpi.org>
>>> Reply-To: Open MPI Users <us...@lists.open-mpi.org>
>>> 
>>> 
>>> Honestly, this is a configuration issue with the openib btl. There is no 
>>> reason to keep either eager RDMA nor is there a reason to pipeline RDMA. I 
>>> haven't found an app where either of these "features" helps you with 
>>> infiniband. You have the right idea with the parameter changes but Howard 
>>> is correct, for Mellanox the future is UCX not verbs. I would try it and 
>>> see if it works for you but if it doesn't I would set those two parameters 
>>> in your /etc/openmpi-mca-params.conf and run like that.
>>> 
>>> -Nathan
>>> 
>>> On Apr 05, 2018, at 01:18 AM, Ben Menadue <ben.mena...@nci.org.au> wrote:
>>> 
>>>> Hi,
>>>> 
>>>> Another interesting point. I noticed that the last two message sizes 
>>>> tested (2MB and 4MB) are lower than expected for both osu_bw and osu_bibw. 
>>>> Increasing the minimum size to use the RDMA pipeline to above these sizes 
>>>> brings those two data-points up to scratch for both benchmarks:
>>>> 
>>>> 3.0.0, osu_bw, no rdma for large messages
>>>> 
>>>> > mpirun -mca btl_openib_min_rdma_pipeline_size 4194304 -map-by ppr:1:node 
>>>> > -np 2 -H r6,r7 ./osu_bw -m 2097152:4194304
>>>> # OSU MPI Bi-Directional Bandwidth Test v5.4.0
>>>> # Size      Bandwidth (MB/s)
>>>> 2097152              6133.22
>>>> 4194304              6054.06
>>>> 
>>>> 3.0.0, osu_bibw, eager rdma disabled, no rdma for large messages
>>>> 
>>>> > mpirun -mca btl_openib_min_rdma_pipeline_size 4194304 -mca 
>>>> > btl_openib_use_eager_rdma 0 -map-by ppr:1:node -np 2 -H r6,r7 ./osu_bibw 
>>>> > -m 2097152:4194304
>>>> # OSU MPI Bi-Directional Bandwidth Test v5.4.0
>>>> # Size      Bandwidth (MB/s)
>>>> 2097152             11397.85
>>>> 4194304             11389.64
>>>> 
>>>> This makes me think something odd is going on in the RDMA pipeline.
>>>> 
>>>> Cheers,
>>>> Ben
>>>> 
>>>> 
>>>> 
>>>>> On 5 Apr 2018, at 5:03 pm, Ben Menadue <ben.mena...@nci.org.au> wrote:
>>>>> Hi,
>>>>> 
>>>>> We’ve just been running some OSU benchmarks with OpenMPI 3.0.0 and 
>>>>> noticed that osu_bibw gives nowhere near the bandwidth I’d expect (this 
>>>>> is on FDR IB). However, osu_bw is fine.
>>>>> 
>>>>> If I disable eager RDMA, then osu_bibw gives the expected numbers. 
>>>>> Similarly, if I increase the number of eager RDMA buffers, it gives the 
>>>>> expected results.
>>>>> 
>>>>> OpenMPI 1.10.7 gives consistent, reasonable numbers with default 
>>>>> settings, but they’re not as good as 3.0.0 (when tuned) for large 
>>>>> buffers. The same option changes produce no different in the performance 
>>>>> for 1.10.7.
>>>>> 
>>>>> I was wondering if anyone else has noticed anything similar, and if this 
>>>>> is unexpected, if anyone has a suggestion on how to investigate further?
>>>>> 
>>>>> Thanks,
>>>>> Ben
>>>>> 
>>>>> 
>>>>> Here’s are the numbers:
>>>>> 
>>>>> 3.0.0, osu_bw, default settings
>>>>> 
>>>>> > mpirun -map-by ppr:1:node -np 2 -H r6,r7 ./osu_bw
>>>>> # OSU MPI Bandwidth Test v5.4.0
>>>>> # Size      Bandwidth (MB/s)
>>>>> 1                       1.13
>>>>> 2                       2.29
>>>>> 4                       4.63
>>>>> 8                       9.21
>>>>> 16                     18.18
>>>>> 32                     36.46
>>>>> 64                     69.95
>>>>> 128                   128.55
>>>>> 256                   250.74
>>>>> 512                   451.54
>>>>> 1024                  829.44
>>>>> 2048                 1475.87
>>>>> 4096                 2119.99
>>>>> 8192                 3452.37
>>>>> 16384                2866.51
>>>>> 32768                4048.17
>>>>> 65536                5030.54
>>>>> 131072               5573.81
>>>>> 262144               5861.61
>>>>> 524288               6015.15
>>>>> 1048576              6099.46
>>>>> 2097152               989.82
>>>>> 4194304               989.81
>>>>> 
>>>>> 3.0.0, osu_bibw, default settings
>>>>> 
>>>>> > mpirun -map-by ppr:1:node -np 2 -H r6,r7 ./osu_bibw
>>>>> # OSU MPI Bi-Directional Bandwidth Test v5.4.0
>>>>> # Size      Bandwidth (MB/s)
>>>>> 1                       0.00
>>>>> 2                       0.01
>>>>> 4                       0.01
>>>>> 8                       0.02
>>>>> 16                      0.04
>>>>> 32                      0.09
>>>>> 64                      0.16
>>>>> 128                   135.30
>>>>> 256                   265.35
>>>>> 512                   499.92
>>>>> 1024                  949.22
>>>>> 2048                 1440.27
>>>>> 4096                 1960.09
>>>>> 8192                 3166.97
>>>>> 16384                 127.62
>>>>> 32768                 165.12
>>>>> 65536                 312.80
>>>>> 131072               1120.03
>>>>> 262144               4724.01
>>>>> 524288               4545.93
>>>>> 1048576              5186.51
>>>>> 2097152               989.84
>>>>> 4194304               989.88
>>>>> 
>>>>> 3.0.0, osu_bibw, eager RDMA disabled
>>>>> 
>>>>> > mpirun -mca btl_openib_use_eager_rdma 0 -map-by ppr:1:node -np 2 -H 
>>>>> > r6,r7 ./osu_bibw
>>>>> # OSU MPI Bi-Directional Bandwidth Test v5.4.0
>>>>> # Size      Bandwidth (MB/s)
>>>>> 1                       1.49
>>>>> 2                       2.97
>>>>> 4                       5.96
>>>>> 8                      11.98
>>>>> 16                     23.95
>>>>> 32                     47.39
>>>>> 64                     93.57
>>>>> 128                   153.82
>>>>> 256                   304.69
>>>>> 512                   572.30
>>>>> 1024                 1003.52
>>>>> 2048                 1083.89
>>>>> 4096                 1879.32
>>>>> 8192                 2785.18
>>>>> 16384                3535.77
>>>>> 32768                5614.72
>>>>> 65536                8113.69
>>>>> 131072               9666.74
>>>>> 262144              10738.97
>>>>> 524288              11247.02
>>>>> 1048576             11416.50
>>>>> 2097152               989.88
>>>>> 4194304               989.88
>>>>> 
>>>>> 3.0.0, osu_bibw, increased eager RDMA buffer count
>>>>> 
>>>>> > mpirun -mca btl_openib_eager_rdma_num 32768 -map-by ppr:1:node -np 2 -H 
>>>>> > r6,r7 ./osu_bibw
>>>>> # OSU MPI Bi-Directional Bandwidth Test v5.4.0
>>>>> # Size      Bandwidth (MB/s)
>>>>> 1                       1.42
>>>>> 2                       2.84
>>>>> 4                       5.67
>>>>> 8                      11.18
>>>>> 16                     22.46
>>>>> 32                     44.65
>>>>> 64                     83.10
>>>>> 128                   154.00
>>>>> 256                   291.63
>>>>> 512                   537.66
>>>>> 1024                  942.35
>>>>> 2048                 1433.09
>>>>> 4096                 2356.40
>>>>> 8192                 1998.54
>>>>> 16384                3584.82
>>>>> 32768                5523.08
>>>>> 65536                7717.63
>>>>> 131072               9419.50
>>>>> 262144              10564.77
>>>>> 524288              11104.71
>>>>> 1048576             11130.75
>>>>> 2097152              7943.89
>>>>> 4194304              5270.00
>>>>> 
>>>>> 1.10.7, osu_bibw, default settings
>>>>> 
>>>>> > mpirun -map-by ppr:1:node -np 2 -H r6,r7 ./osu_bibw
>>>>> # OSU MPI Bi-Directional Bandwidth Test v5.4.0
>>>>> # Size      Bandwidth (MB/s)
>>>>> 1                       1.70
>>>>> 2                       3.45
>>>>> 4                       6.95
>>>>> 8                      13.68
>>>>> 16                     27.41
>>>>> 32                     53.80
>>>>> 64                    105.34
>>>>> 128                   164.40
>>>>> 256                   324.63
>>>>> 512                   623.95
>>>>> 1024                 1127.35
>>>>> 2048                 1784.58
>>>>> 4096                 3305.45
>>>>> 8192                 3697.55
>>>>> 16384                4935.75
>>>>> 32768                7186.28
>>>>> 65536                8996.94
>>>>> 131072               9301.78
>>>>> 262144               4691.36
>>>>> 524288               7039.18
>>>>> 1048576              7213.33
>>>>> 2097152              9601.41
>>>>> 4194304              9281.31
>>>>> 
>>>>> 
>>>> 
>>>> _______________________________________________
>>>> users mailing list
>>>> us...@lists.open-mpi.org
>>>> https://lists.open-mpi.org/mailman/listinfo/users
>>> _______________________________________________
>>> users mailing list
>>> us...@lists.open-mpi.org
>>> https://lists.open-mpi.org/mailman/listinfo/users
>> 
>> 
>> -- 
>> Jeff Squyres
>> jsquy...@cisco.com
>> 
>> _______________________________________________
>> devel mailing list
>> devel@lists.open-mpi.org
>> https://lists.open-mpi.org/mailman/listinfo/devel
> 
> _______________________________________________
> devel mailing list
> devel@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/devel


-- 
Jeff Squyres
jsquy...@cisco.com

_______________________________________________
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel

Reply via email to