Iain Bason wrote:
On Mar 3, 2010, at 3:04 PM, Jeff Squyres wrote:
Mmmm... good point. I was thinking specifically of the if_in|exclude behavior
in the openib BTL. That uses strcmp, not strncmp. Here's a complete list:
ompi_info --param all all --parsable | grep include | grep :value:
mca:opal:base:param:opal_event_include:value:pollmca:btl:ofud:param:btl_ofud_if_include:value:
mca:btl:openib:param:btl_openib_if_include:value:
mca:btl:openib:param:btl_openib_ipaddr_include:value:mca:btl:openib:param:btl_openib_cpc_include:value:
mca:btl:sctp:param:btl_sctp_if_include:value:
mca:btl:tcp:param:btl_tcp_if_include:value:
mca:btl:base:param:btl_base_include:value:
mca:oob:tcp:param:oob_tcp_if_include:value:
Do we know what these others do? I only checked openib_if_*clude -- it's
strcmp.
I haven't looked at those, but it's easy to grep for strncmp...
It looks as though sctp is the only other BTL that uses strncmp.
Of course, if we decide to change the default so that it no longer includes
"lo" then maybe using strncmp doesn't matter. The problem has been that the
name of the interface is different on different platforms.
(I should note that the default also excludes "sppp". I don't know anything
about that interface.)
I may be wrong for the usage here but the old Sun Starcats had a tcp
interface named sppp to its diagnostic processor that we needed to skip.
Not sure if this is the same reason done here, I couldn't find where
sppp was referenced so I could find the history of the line in opengrok.
--td
Additionally, if loopback is now handled properly via change #2, shouldn't the
default value for the btl_tcp_if_exclude parameter now be empty?
That's a good question. Enabling the "lo" interface results in intra-node
messages being striped across that interface in addition to the others on a system. I
don't know what impact that would have, if any.
sm and self should still be prioritized above it, right? If so, we should be
ok.
Yes, that's true. It would only affect those who restrict intra-node
communication to TCP.
However, I think you're right that the addition of striping across lo* in
addition to the other interfaces might have an unknown effect.
Here's a random question -- if a user does not use the sm btl, would sending messages
through lo for on-node communication be potentially better than sending it through a real
device, given that that real device may be far away (in the NUMA sense of
"far")? I.e., are OS's typically smart enough to know that loopback traffic
may be able to stay local to the NUMA node, vs. sending it out to a device and back? Or
are OS's smart enough to know that if the both ends of a TCP socket are on the same node
-- regardless of what IP interface they use -- and if both processes are on the same NUMA
locality, that the data can stay local and not have to make a round trip to the device?
(I admit that this is a fairly corner case -- doing on-node communication but
*not* using the sm btl...)
Good question. For the loopback interface there is no physical device, so
there should be no NUMA effect. For an interface with a physical device there
may be some reason that a packet would actually have to go out to the device.
If there is no such reason, I would expect Unix to be smart enough not to do
it, given how much intra-node TCP traffic one commonly sees on Unix. I
couldn't hazard a guess about Windows.
Iain
_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel