Just more information to help with the decision: I am working on Nathan’s uct btl to make it work with ob1 and infiniband. So this could be a replacement for openib and honestly we should totally call this new uct btl Kylo.
Arm > On Apr 5, 2018, at 1:37 PM, Jeff Squyres (jsquyres) <jsquy...@cisco.com> > wrote: > > Below is an email exchange from the users mailing list. > > I'm moving this over to devel to talk among the developer community. > > Multiple times recently on the users list, we've told people with problems > with the openib BTL that they should be using UCX (per Mellanox's > publicly-stated support positions). > > Is it time to deprecate / print warning messages / remove the openib BTL? > > > >> Begin forwarded message: >> >> From: Nathan Hjelm <hje...@me.com <mailto:hje...@me.com>> >> Subject: Re: [OMPI users] Eager RDMA causing slow osu_bibw with 3.0.0 >> Date: April 5, 2018 at 12:48:08 PM EDT >> To: Open MPI Users <us...@lists.open-mpi.org >> <mailto:us...@lists.open-mpi.org>> >> Cc: Open MPI Users <us...@lists.open-mpi.org >> <mailto:us...@lists.open-mpi.org>> >> Reply-To: Open MPI Users <us...@lists.open-mpi.org >> <mailto:us...@lists.open-mpi.org>> >> >> >> Honestly, this is a configuration issue with the openib btl. There is no >> reason to keep either eager RDMA nor is there a reason to pipeline RDMA. I >> haven't found an app where either of these "features" helps you with >> infiniband. You have the right idea with the parameter changes but Howard is >> correct, for Mellanox the future is UCX not verbs. I would try it and see if >> it works for you but if it doesn't I would set those two parameters in your >> /etc/openmpi-mca-params.conf and run like that. >> >> -Nathan >> >> On Apr 05, 2018, at 01:18 AM, Ben Menadue <ben.mena...@nci.org.au >> <mailto:ben.mena...@nci.org.au>> wrote: >> >>> Hi, >>> >>> Another interesting point. I noticed that the last two message sizes tested >>> (2MB and 4MB) are lower than expected for both osu_bw and osu_bibw. >>> Increasing the minimum size to use the RDMA pipeline to above these sizes >>> brings those two data-points up to scratch for both benchmarks: >>> >>> 3.0.0, osu_bw, no rdma for large messages >>> >>> > mpirun -mca btl_openib_min_rdma_pipeline_size 4194304 -map-by ppr:1:node >>> > -np 2 -H r6,r7 ./osu_bw -m 2097152:4194304 >>> # OSU MPI Bi-Directional Bandwidth Test v5.4.0 >>> # Size Bandwidth (MB/s) >>> 2097152 6133.22 >>> 4194304 6054.06 >>> >>> 3.0.0, osu_bibw, eager rdma disabled, no rdma for large messages >>> >>> > mpirun -mca btl_openib_min_rdma_pipeline_size 4194304 -mca >>> > btl_openib_use_eager_rdma 0 -map-by ppr:1:node -np 2 -H r6,r7 ./osu_bibw >>> > -m 2097152:4194304 >>> # OSU MPI Bi-Directional Bandwidth Test v5.4.0 >>> # Size Bandwidth (MB/s) >>> 2097152 11397.85 >>> 4194304 11389.64 >>> >>> This makes me think something odd is going on in the RDMA pipeline. >>> >>> Cheers, >>> Ben >>> >>> >>> >>>> On 5 Apr 2018, at 5:03 pm, Ben Menadue <ben.mena...@nci.org.au >>>> <mailto:ben.mena...@nci.org.au>> wrote: >>>> Hi, >>>> >>>> We’ve just been running some OSU benchmarks with OpenMPI 3.0.0 and noticed >>>> that osu_bibw gives nowhere near the bandwidth I’d expect (this is on FDR >>>> IB). However, osu_bw is fine. >>>> >>>> If I disable eager RDMA, then osu_bibw gives the expected numbers. >>>> Similarly, if I increase the number of eager RDMA buffers, it gives the >>>> expected results. >>>> >>>> OpenMPI 1.10.7 gives consistent, reasonable numbers with default settings, >>>> but they’re not as good as 3.0.0 (when tuned) for large buffers. The same >>>> option changes produce no different in the performance for 1.10.7. >>>> >>>> I was wondering if anyone else has noticed anything similar, and if this >>>> is unexpected, if anyone has a suggestion on how to investigate further? >>>> >>>> Thanks, >>>> Ben >>>> >>>> >>>> Here’s are the numbers: >>>> >>>> 3.0.0, osu_bw, default settings >>>> >>>> > mpirun -map-by ppr:1:node -np 2 -H r6,r7 ./osu_bw >>>> # OSU MPI Bandwidth Test v5.4.0 >>>> # Size Bandwidth (MB/s) >>>> 1 1.13 >>>> 2 2.29 >>>> 4 4.63 >>>> 8 9.21 >>>> 16 18.18 >>>> 32 36.46 >>>> 64 69.95 >>>> 128 128.55 >>>> 256 250.74 >>>> 512 451.54 >>>> 1024 829.44 >>>> 2048 1475.87 >>>> 4096 2119.99 >>>> 8192 3452.37 >>>> 16384 2866.51 >>>> 32768 4048.17 >>>> 65536 5030.54 >>>> 131072 5573.81 >>>> 262144 5861.61 >>>> 524288 6015.15 >>>> 1048576 6099.46 >>>> 2097152 989.82 >>>> 4194304 989.81 >>>> >>>> 3.0.0, osu_bibw, default settings >>>> >>>> > mpirun -map-by ppr:1:node -np 2 -H r6,r7 ./osu_bibw >>>> # OSU MPI Bi-Directional Bandwidth Test v5.4.0 >>>> # Size Bandwidth (MB/s) >>>> 1 0.00 >>>> 2 0.01 >>>> 4 0.01 >>>> 8 0.02 >>>> 16 0.04 >>>> 32 0.09 >>>> 64 0.16 >>>> 128 135.30 >>>> 256 265.35 >>>> 512 499.92 >>>> 1024 949.22 >>>> 2048 1440.27 >>>> 4096 1960.09 >>>> 8192 3166.97 >>>> 16384 127.62 >>>> 32768 165.12 >>>> 65536 312.80 >>>> 131072 1120.03 >>>> 262144 4724.01 >>>> 524288 4545.93 >>>> 1048576 5186.51 >>>> 2097152 989.84 >>>> 4194304 989.88 >>>> >>>> 3.0.0, osu_bibw, eager RDMA disabled >>>> >>>> > mpirun -mca btl_openib_use_eager_rdma 0 -map-by ppr:1:node -np 2 -H >>>> > r6,r7 ./osu_bibw >>>> # OSU MPI Bi-Directional Bandwidth Test v5.4.0 >>>> # Size Bandwidth (MB/s) >>>> 1 1.49 >>>> 2 2.97 >>>> 4 5.96 >>>> 8 11.98 >>>> 16 23.95 >>>> 32 47.39 >>>> 64 93.57 >>>> 128 153.82 >>>> 256 304.69 >>>> 512 572.30 >>>> 1024 1003.52 >>>> 2048 1083.89 >>>> 4096 1879.32 >>>> 8192 2785.18 >>>> 16384 3535.77 >>>> 32768 5614.72 >>>> 65536 8113.69 >>>> 131072 9666.74 >>>> 262144 10738.97 >>>> 524288 11247.02 >>>> 1048576 11416.50 >>>> 2097152 989.88 >>>> 4194304 989.88 >>>> >>>> 3.0.0, osu_bibw, increased eager RDMA buffer count >>>> >>>> > mpirun -mca btl_openib_eager_rdma_num 32768 -map-by ppr:1:node -np 2 -H >>>> > r6,r7 ./osu_bibw >>>> # OSU MPI Bi-Directional Bandwidth Test v5.4.0 >>>> # Size Bandwidth (MB/s) >>>> 1 1.42 >>>> 2 2.84 >>>> 4 5.67 >>>> 8 11.18 >>>> 16 22.46 >>>> 32 44.65 >>>> 64 83.10 >>>> 128 154.00 >>>> 256 291.63 >>>> 512 537.66 >>>> 1024 942.35 >>>> 2048 1433.09 >>>> 4096 2356.40 >>>> 8192 1998.54 >>>> 16384 3584.82 >>>> 32768 5523.08 >>>> 65536 7717.63 >>>> 131072 9419.50 >>>> 262144 10564.77 >>>> 524288 11104.71 >>>> 1048576 11130.75 >>>> 2097152 7943.89 >>>> 4194304 5270.00 >>>> >>>> 1.10.7, osu_bibw, default settings >>>> >>>> > mpirun -map-by ppr:1:node -np 2 -H r6,r7 ./osu_bibw >>>> # OSU MPI Bi-Directional Bandwidth Test v5.4.0 >>>> # Size Bandwidth (MB/s) >>>> 1 1.70 >>>> 2 3.45 >>>> 4 6.95 >>>> 8 13.68 >>>> 16 27.41 >>>> 32 53.80 >>>> 64 105.34 >>>> 128 164.40 >>>> 256 324.63 >>>> 512 623.95 >>>> 1024 1127.35 >>>> 2048 1784.58 >>>> 4096 3305.45 >>>> 8192 3697.55 >>>> 16384 4935.75 >>>> 32768 7186.28 >>>> 65536 8996.94 >>>> 131072 9301.78 >>>> 262144 4691.36 >>>> 524288 7039.18 >>>> 1048576 7213.33 >>>> 2097152 9601.41 >>>> 4194304 9281.31 >>>> >>>> >>> >>> _______________________________________________ >>> users mailing list >>> us...@lists.open-mpi.org <mailto:us...@lists.open-mpi.org> >>> https://lists.open-mpi.org/mailman/listinfo/users >>> <https://lists.open-mpi.org/mailman/listinfo/users>_______________________________________________ >> users mailing list >> us...@lists.open-mpi.org <mailto:us...@lists.open-mpi.org> >> https://lists.open-mpi.org/mailman/listinfo/users > > > -- > Jeff Squyres > jsquy...@cisco.com <mailto:jsquy...@cisco.com> > _______________________________________________ > devel mailing list > devel@lists.open-mpi.org > https://lists.open-mpi.org/mailman/listinfo/devel
_______________________________________________ devel mailing list devel@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/devel