On Mar 3, 2010, at 15:04 , Jeff Squyres wrote:

> On Mar 3, 2010, at 2:06 PM, Iain Bason wrote:
> 
>>> 1. The individual entries now behave like pseudo-regexp's rather that 
>>> strict matching.  We used strict matching before this for a reason.  If we 
>>> want to allow regexp-like behavior, then I think we should enable that with 
>>> special characters -- that's the customary/usual way to do it.
>> 
>> The history of this particular piece of code is that it used to use strncmp. 
>>  George Bosilca changed it last summer, incidental to a larger change 
>> (r21652).  The commit comment was not particularly illuminating on this 
>> issue, in my opinion:
>> 
>> http://www.open-mpi.org/hg/hgwebdir.cgi/ompi-svn-mirror/rev/bde31d3db7ba
> 
> You're right -- it's not illuminating... :-\
> 
>>> 2. All other <foo>_in|exclude behavior in ompi is strict matching, not 
>>> prefix matching.  I'm uncomfortable with the disparity.
>> 
>> That turns out not to be the case.  Look in 
>> btl_tcp_proc.c/mca_btl_tcp_retrieve_local_interfaces.

I guess this is the result different developers with different ideas working on 
a non consistent way. This is without talking about the fact that we do the 
same checking in several places, and we duplicate the code in a way that 
doesn't enforce any consistency. Anyway, now that this problem is highlighted, 
we should clearly fix it.

> Mmmm... good point.  I was thinking specifically of the if_in|exclude 
> behavior in the openib BTL.  That uses strcmp, not strncmp.  Here's a 
> complete list:
> 
> ompi_info --param all all --parsable | grep include | grep :value:
> mca:opal:base:param:opal_event_include:value:pollmca:btl:ofud:param:btl_ofud_if_include:value:
> mca:btl:openib:param:btl_openib_if_include:value:
> mca:btl:openib:param:btl_openib_ipaddr_include:value:mca:btl:openib:param:btl_openib_cpc_include:value:
> mca:btl:sctp:param:btl_sctp_if_include:value:
> mca:btl:tcp:param:btl_tcp_if_include:value:
> mca:btl:base:param:btl_base_include:value:
> mca:oob:tcp:param:oob_tcp_if_include:value:
> 
> Do we know what these others do?  I only checked openib_if_*clude -- it's 
> strcmp.
> 
>>> Additionally, if loopback is now handled properly via change #2, shouldn't 
>>> the default value for the btl_tcp_if_exclude parameter now be empty?
>> 
>> That's a good question.  Enabling the "lo" interface results in intra-node 
>> messages being striped across that interface in addition to the others on a 
>> system.  I don't know what impact that would have, if any.
> 
> sm and self should still be prioritized above it, right?  If so, we should be 
> ok.
> 
> However, I think you're right that the addition of striping across lo* in 
> addition to the other interfaces might have an unknown effect.

This is not supposed to happen. The sm BTL has a high exclusivity, which will 
prevent the TCP BTL to be used for the same peer. But again, this was the case 
a while ago, there is nothing to guarantee that the code is still doing what it 
was supposed to.

  george.

> Here's a random question -- if a user does not use the sm btl, would sending 
> messages through lo for on-node communication be potentially better than 
> sending it through a real device, given that that real device may be far away 
> (in the NUMA sense of "far")?  I.e., are OS's typically smart enough to know 
> that loopback traffic may be able to stay local to the NUMA node, vs. sending 
> it out to a device and back?  Or are OS's smart enough to know that if the 
> both ends of a TCP socket are on the same node -- regardless of what IP 
> interface they use -- and if both processes are on the same NUMA locality, 
> that the data can stay local and not have to make a round trip to the device?
> 
> (I admit that this is a fairly corner case -- doing on-node communication but 
> *not* using the sm btl...)
> 
>>> Actually -- thinking about this a little more, does opal_net_islocalhost() 
>>> guarantee to work on peer interfaces? 
>> 
>> It looks to see whether the IP address is (v4) 127.0.0.1, or (v6) ::1.  I 
>> believe that these values are dictated by the relevant RFCs (but I haven't 
>> looked to make sure).
> 
> Good enough -- thanks!  (I was thinking that it might be checking interfaces, 
> not IP addrs -- so 127.x checking should be fine here)
> 
> -- 
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
> 
> 
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel


Reply via email to