On Mar 3, 2010, at 2:06 PM, Iain Bason wrote:

> > 1. The individual entries now behave like pseudo-regexp's rather that 
> > strict matching.  We used strict matching before this for a reason.  If we 
> > want to allow regexp-like behavior, then I think we should enable that with 
> > special characters -- that's the customary/usual way to do it.
> 
> The history of this particular piece of code is that it used to use strncmp.  
> George Bosilca changed it last summer, incidental to a larger change 
> (r21652).  The commit comment was not particularly illuminating on this 
> issue, in my opinion:
> 
> http://www.open-mpi.org/hg/hgwebdir.cgi/ompi-svn-mirror/rev/bde31d3db7ba

You're right -- it's not illuminating... :-\

> > 2. All other <foo>_in|exclude behavior in ompi is strict matching, not 
> > prefix matching.  I'm uncomfortable with the disparity.
> 
> That turns out not to be the case.  Look in 
> btl_tcp_proc.c/mca_btl_tcp_retrieve_local_interfaces.

Mmmm... good point.  I was thinking specifically of the if_in|exclude behavior 
in the openib BTL.  That uses strcmp, not strncmp.  Here's a complete list:

ompi_info --param all all --parsable | grep include | grep :value:
mca:opal:base:param:opal_event_include:value:pollmca:btl:ofud:param:btl_ofud_if_include:value:
mca:btl:openib:param:btl_openib_if_include:value:
mca:btl:openib:param:btl_openib_ipaddr_include:value:mca:btl:openib:param:btl_openib_cpc_include:value:
mca:btl:sctp:param:btl_sctp_if_include:value:
mca:btl:tcp:param:btl_tcp_if_include:value:
mca:btl:base:param:btl_base_include:value:
mca:oob:tcp:param:oob_tcp_if_include:value:

Do we know what these others do?  I only checked openib_if_*clude -- it's 
strcmp.

> > Additionally, if loopback is now handled properly via change #2, shouldn't 
> > the default value for the btl_tcp_if_exclude parameter now be empty?
> 
> That's a good question.  Enabling the "lo" interface results in intra-node 
> messages being striped across that interface in addition to the others on a 
> system.  I don't know what impact that would have, if any.

sm and self should still be prioritized above it, right?  If so, we should be 
ok.

However, I think you're right that the addition of striping across lo* in 
addition to the other interfaces might have an unknown effect.

Here's a random question -- if a user does not use the sm btl, would sending 
messages through lo for on-node communication be potentially better than 
sending it through a real device, given that that real device may be far away 
(in the NUMA sense of "far")?  I.e., are OS's typically smart enough to know 
that loopback traffic may be able to stay local to the NUMA node, vs. sending 
it out to a device and back?  Or are OS's smart enough to know that if the both 
ends of a TCP socket are on the same node -- regardless of what IP interface 
they use -- and if both processes are on the same NUMA locality, that the data 
can stay local and not have to make a round trip to the device?

(I admit that this is a fairly corner case -- doing on-node communication but 
*not* using the sm btl...)

> > Actually -- thinking about this a little more, does opal_net_islocalhost() 
> > guarantee to work on peer interfaces? 
> 
> It looks to see whether the IP address is (v4) 127.0.0.1, or (v6) ::1.  I 
> believe that these values are dictated by the relevant RFCs (but I haven't 
> looked to make sure).

Good enough -- thanks!  (I was thinking that it might be checking interfaces, 
not IP addrs -- so 127.x checking should be fine here)

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/


Reply via email to