On Mar 3, 2010, at 15:04 , Jeff Squyres wrote: > On Mar 3, 2010, at 2:06 PM, Iain Bason wrote: > >>> 1. The individual entries now behave like pseudo-regexp's rather that >>> strict matching. We used strict matching before this for a reason. If we >>> want to allow regexp-like behavior, then I think we should enable that with >>> special characters -- that's the customary/usual way to do it. >> >> The history of this particular piece of code is that it used to use strncmp. >> George Bosilca changed it last summer, incidental to a larger change >> (r21652). The commit comment was not particularly illuminating on this >> issue, in my opinion: >> >> http://www.open-mpi.org/hg/hgwebdir.cgi/ompi-svn-mirror/rev/bde31d3db7ba > > You're right -- it's not illuminating... :-\ > >>> 2. All other <foo>_in|exclude behavior in ompi is strict matching, not >>> prefix matching. I'm uncomfortable with the disparity. >> >> That turns out not to be the case. Look in >> btl_tcp_proc.c/mca_btl_tcp_retrieve_local_interfaces.
I guess this is the result different developers with different ideas working on a non consistent way. This is without talking about the fact that we do the same checking in several places, and we duplicate the code in a way that doesn't enforce any consistency. Anyway, now that this problem is highlighted, we should clearly fix it. > Mmmm... good point. I was thinking specifically of the if_in|exclude > behavior in the openib BTL. That uses strcmp, not strncmp. Here's a > complete list: > > ompi_info --param all all --parsable | grep include | grep :value: > mca:opal:base:param:opal_event_include:value:pollmca:btl:ofud:param:btl_ofud_if_include:value: > mca:btl:openib:param:btl_openib_if_include:value: > mca:btl:openib:param:btl_openib_ipaddr_include:value:mca:btl:openib:param:btl_openib_cpc_include:value: > mca:btl:sctp:param:btl_sctp_if_include:value: > mca:btl:tcp:param:btl_tcp_if_include:value: > mca:btl:base:param:btl_base_include:value: > mca:oob:tcp:param:oob_tcp_if_include:value: > > Do we know what these others do? I only checked openib_if_*clude -- it's > strcmp. > >>> Additionally, if loopback is now handled properly via change #2, shouldn't >>> the default value for the btl_tcp_if_exclude parameter now be empty? >> >> That's a good question. Enabling the "lo" interface results in intra-node >> messages being striped across that interface in addition to the others on a >> system. I don't know what impact that would have, if any. > > sm and self should still be prioritized above it, right? If so, we should be > ok. > > However, I think you're right that the addition of striping across lo* in > addition to the other interfaces might have an unknown effect. This is not supposed to happen. The sm BTL has a high exclusivity, which will prevent the TCP BTL to be used for the same peer. But again, this was the case a while ago, there is nothing to guarantee that the code is still doing what it was supposed to. george. > Here's a random question -- if a user does not use the sm btl, would sending > messages through lo for on-node communication be potentially better than > sending it through a real device, given that that real device may be far away > (in the NUMA sense of "far")? I.e., are OS's typically smart enough to know > that loopback traffic may be able to stay local to the NUMA node, vs. sending > it out to a device and back? Or are OS's smart enough to know that if the > both ends of a TCP socket are on the same node -- regardless of what IP > interface they use -- and if both processes are on the same NUMA locality, > that the data can stay local and not have to make a round trip to the device? > > (I admit that this is a fairly corner case -- doing on-node communication but > *not* using the sm btl...) > >>> Actually -- thinking about this a little more, does opal_net_islocalhost() >>> guarantee to work on peer interfaces? >> >> It looks to see whether the IP address is (v4) 127.0.0.1, or (v6) ::1. I >> believe that these values are dictated by the relevant RFCs (but I haven't >> looked to make sure). > > Good enough -- thanks! (I was thinking that it might be checking interfaces, > not IP addrs -- so 127.x checking should be fine here) > > -- > Jeff Squyres > jsquy...@cisco.com > For corporate legal information go to: > http://www.cisco.com/web/about/doing_business/legal/cri/ > > > _______________________________________________ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel