Terry Dontje wrote:
Iain Bason wrote:
On Mar 3, 2010, at 3:04 PM, Jeff Squyres wrote:
Mmmm... good point. I was thinking specifically of the
if_in|exclude behavior in the openib BTL. That uses strcmp, not
strncmp. Here's a complete list:
ompi_info --param all all --parsable | grep include | grep :value:
mca:opal:base:param:opal_event_include:value:pollmca:btl:ofud:param:btl_ofud_if_include:value:
mca:btl:openib:param:btl_openib_if_include:value:
mca:btl:openib:param:btl_openib_ipaddr_include:value:mca:btl:openib:param:btl_openib_cpc_include:value:
mca:btl:sctp:param:btl_sctp_if_include:value:
mca:btl:tcp:param:btl_tcp_if_include:value:
mca:btl:base:param:btl_base_include:value:
mca:oob:tcp:param:oob_tcp_if_include:value:
Do we know what these others do? I only checked openib_if_*clude --
it's strcmp.
I haven't looked at those, but it's easy to grep for strncmp...
It looks as though sctp is the only other BTL that uses strncmp.
Of course, if we decide to change the default so that it no longer
includes "lo" then maybe using strncmp doesn't matter. The problem
has been that the name of the interface is different on different
platforms.
(I should note that the default also excludes "sppp". I don't know
anything about that interface.)
I may be wrong for the usage here but the old Sun Starcats had a tcp
interface named sppp to its diagnostic processor that we needed to skip.
Not sure if this is the same reason done here, I couldn't find where
sppp was referenced so I could find the history of the line in opengrok.
--td
Close, r19988 added sppp to the excluded interfaces for the Sun M9000
server. I believe for the same reason I gave above.
--td
Additionally, if loopback is now handled properly via change #2,
shouldn't the default value for the btl_tcp_if_exclude parameter
now be empty?
That's a good question. Enabling the "lo" interface results in
intra-node messages being striped across that interface in addition
to the others on a system. I don't know what impact that would
have, if any.
sm and self should still be prioritized above it, right? If so, we
should be ok.
Yes, that's true. It would only affect those who restrict intra-node
communication to TCP.
However, I think you're right that the addition of striping across
lo* in addition to the other interfaces might have an unknown effect.
Here's a random question -- if a user does not use the sm btl, would
sending messages through lo for on-node communication be potentially
better than sending it through a real device, given that that real
device may be far away (in the NUMA sense of "far")? I.e., are OS's
typically smart enough to know that loopback traffic may be able to
stay local to the NUMA node, vs. sending it out to a device and
back? Or are OS's smart enough to know that if the both ends of a
TCP socket are on the same node -- regardless of what IP interface
they use -- and if both processes are on the same NUMA locality,
that the data can stay local and not have to make a round trip to
the device?
(I admit that this is a fairly corner case -- doing on-node
communication but *not* using the sm btl...)
Good question. For the loopback interface there is no physical
device, so there should be no NUMA effect. For an interface with a
physical device there may be some reason that a packet would actually
have to go out to the device. If there is no such reason, I would
expect Unix to be smart enough not to do it, given how much
intra-node TCP traffic one commonly sees on Unix. I couldn't hazard
a guess about Windows.
Iain
_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel
_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel