Iain Bason wrote:
On Mar 3, 2010, at 3:04 PM, Jeff Squyres wrote:

Mmmm... good point.  I was thinking specifically of the if_in|exclude behavior 
in the openib BTL.  That uses strcmp, not strncmp.  Here's a complete list:

ompi_info --param all all --parsable | grep include | grep :value:
mca:opal:base:param:opal_event_include:value:pollmca:btl:ofud:param:btl_ofud_if_include:value:
mca:btl:openib:param:btl_openib_if_include:value:
mca:btl:openib:param:btl_openib_ipaddr_include:value:mca:btl:openib:param:btl_openib_cpc_include:value:
mca:btl:sctp:param:btl_sctp_if_include:value:
mca:btl:tcp:param:btl_tcp_if_include:value:
mca:btl:base:param:btl_base_include:value:
mca:oob:tcp:param:oob_tcp_if_include:value:

Do we know what these others do?  I only checked openib_if_*clude -- it's 
strcmp.

I haven't looked at those, but it's easy to grep for strncmp...

It looks as though sctp is the only other BTL that uses strncmp.

Of course, if we decide to change the default so that it no longer includes 
"lo" then maybe using strncmp doesn't matter.  The problem has been that the 
name of the interface is different on different platforms.

(I should note that the default also excludes "sppp".  I don't know anything 
about that interface.)
I may be wrong for the usage here but the old Sun Starcats had a tcp interface named sppp to its diagnostic processor that we needed to skip. Not sure if this is the same reason done here, I couldn't find where sppp was referenced so I could find the history of the line in opengrok.

--td
Additionally, if loopback is now handled properly via change #2, shouldn't the 
default value for the btl_tcp_if_exclude parameter now be empty?
That's a good question.  Enabling the "lo" interface results in intra-node 
messages being striped across that interface in addition to the others on a system.  I 
don't know what impact that would have, if any.
sm and self should still be prioritized above it, right?  If so, we should be 
ok.

Yes, that's true.  It would only affect those who restrict intra-node 
communication to TCP.

However, I think you're right that the addition of striping across lo* in 
addition to the other interfaces might have an unknown effect.

Here's a random question -- if a user does not use the sm btl, would sending messages 
through lo for on-node communication be potentially better than sending it through a real 
device, given that that real device may be far away (in the NUMA sense of 
"far")?  I.e., are OS's typically smart enough to know that loopback traffic 
may be able to stay local to the NUMA node, vs. sending it out to a device and back?  Or 
are OS's smart enough to know that if the both ends of a TCP socket are on the same node 
-- regardless of what IP interface they use -- and if both processes are on the same NUMA 
locality, that the data can stay local and not have to make a round trip to the device?

(I admit that this is a fairly corner case -- doing on-node communication but 
*not* using the sm btl...)

Good question.  For the loopback interface there is no physical device, so 
there should be no NUMA effect.  For an interface with a physical device there 
may be some reason that a packet would actually have to go out to the device.  
If there is no such reason, I would expect Unix to be smart enough not to do 
it, given how much intra-node TCP traffic one commonly sees on Unix.  I 
couldn't hazard a guess about Windows.

Iain

_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

Reply via email to