Dave,

These settings tell ompi to use native infiniband on the ib qdr port and 
tcpo/ip on the other port.

From the faq, roce is implemented in the openib btl
http://www.open-mpi.org/faq/?category=openfabrics#ompi-over-roce

Did you use 
--mca btl_openib_cpc_include rdmacm
in  your first tests ?

I had some second thougths about the bandwidth values, and imho they should be 
327680 and 81920 because of the 8/10 encoding
(And that being said, that should not change the measured performance)

Also, could you try again by forcing the same btl_tcp_latency and 
btl_openib_latency ?

Cheers,

Gilles

Dave Turner <drdavetur...@gmail.com> wrote:
>George,
>
>
>     I can check with my guys on Monday but I think the bandwidth parameters 
>
>are the defaults.  I did alter these to 40960 and 10240 as someone else 
>
>suggested to me.  The attached graph shows the base red line, along with
>
>the manual balanced blue line and auto balanced green line (0's for both).
>
>This shift lower suggests to me that the higher TCP latency is being pulled in.
>
>I'm not sure why the curves are shifted right.
>
>
>                        Dave
>
>
>On Fri, Feb 6, 2015 at 5:32 PM, George Bosilca <bosi...@icl.utk.edu> wrote:
>
>Dave,
>
>
>Based on your ompi_info.all the following bandwidth are reported on your 
>system:
>
> 
>
>                MCA btl: parameter "btl_openib_bandwidth" (current value: "4", 
>data source: default, level: 5 tuner/detail, type: unsigned)
>
>                          Approximate maximum bandwidth of interconnect (0 = 
>auto-detect value at run-time [not supported in all BTL modules], >= 1 = 
>bandwidth in Mbps)
>
>
>                 MCA btl: parameter "btl_tcp_bandwidth" (current value: "100", 
>data source: default, level: 5 tuner/detail, type: unsigned)
>
>                          Approximate maximum bandwidth of interconnect (0 = 
>auto-detect value at run-time [not supported in all BTL modules], >= 1 = 
>bandwidth in Mbps)
>
>
>This basically states that on your system the default values for these 
>parameters are wrong, your TCP network being much faster than the IB. This 
>explains the somewhat unexpected decision of OMPI.
>
>
>As a possible solution I suggest you set these bandwidth values to something 
>more meaningful (directly in your configuration file). As an example,
>
>
>btl_openib_bandwidth = 40000
>
>btl_tcp_bandwidth = 10000
>
>
>make more sense based on your HPC system description.
>
>
>  George.
>
>
>
>
>
>On Fri, Feb 6, 2015 at 5:37 PM, Dave Turner <drdavetur...@gmail.com> wrote:
>
>
>     We have nodes in our HPC system that have 2 NIC's, 
>
>one being QDR IB and the second being a slower 10 Gbps card
>
>configured for both RoCE and TCP.  Aggregate bandwidth 
>
>tests with 20 cores on one node yelling at 20 cores on a second
>
>node (attached roce.ib.aggregate.pdf) show that without tuning
>
>the slower RoCE interface is being used for small messages
>
>then QDR IB is used for larger messages (red line).  Tuning
>
>the tcp_exclusivity to 1024 to match the openib_exclusivity 
>
>adds another 20 Gbps of bidirectional bandwidth to the high end (green line),
>
>and I'm guessing this is TCP traffic and not RoCE.
>
>
>     So by default the slower interface is being chosen on the low end, and
>
>I don't think there are tunable parameters to allow me to choose the 
>
>QDR interface as the default.  Going forward we'll probably just disable 
>
>RoCE on these nodes and go with QDR IB plus 10 Gbps TCP for large messages.   
>
>
>      However, I do think these issues will come up more in the future.
>
>With the low latency of RoCE matching IB, there are more opportunities
>
>to do channel bonding or allowing multiple interfaces for aggregate traffic
>
>for even smaller message sizes. 
>
>
>                Dave Turner
>
>
>-- 
>
>Work:     davetur...@ksu.edu     (785) 532-7791
>
>             118 Nichols Hall, Manhattan KS  66502
>Home:    drdavetur...@gmail.com
>              cell: (785) 770-5929
>
>
>_______________________________________________
>devel mailing list
>de...@open-mpi.org
>Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>Link to this post: 
>http://www.open-mpi.org/community/lists/devel/2015/02/16951.php
>
>
>
>
>
>-- 
>
>Work:     davetur...@ksu.edu     (785) 532-7791
>
>             118 Nichols Hall, Manhattan KS  66502
>Home:    drdavetur...@gmail.com
>              cell: (785) 770-5929
>

Reply via email to