i got that, and i consider my suggestion as an improvement to your proposal.

if i want to exclude ib0, i might want to
mpirun --mca btl_tcp_if_exclude ib0 ...

to me, this is an honest mistake, but with your proposal, i would be
screwed when
running on more than one node because i should have
mpirun --mca btl_tcp_if_exclude ib0,lo ...

and if this parameter is set by the admin in the system-wide config,
then this configuration must be adapted by the admin, and that could
generate some confusion.

my suggestion simply adds a "safety net" to your proposal

for the sake of completion, i do not really care whether there should
be a safety net or not if localhost is explicitly included via the the
btl_tcp_if_include MCA parameter

a different and safe/friendly proposal is to add a new
btl_tcp_if_exclude_localhost MCA param, which is true by default, so
you would simply force it to false if you want to MPI_Comm_spawn or
use the tcp btl on your disconnected laptop.

as a side note, this reminds me that the openib/btl is used by default
for intra node communication between two tasks from different jobs (sm
nor vader cannot be used yet, and btl/openib has a higher exclusivity
than btl/tcp). my first impression is that i am not so comfortable
with that, and we could add yet an other MCA parameter so btl/openib
disqualifies itself for intra node communications.



On Thu, Sep 22, 2016 at 12:56 AM, George Bosilca <> wrote:
> My proposal is not about adding new ways of deciding what is local and what
> not. I proposed to use the corresponding MCA parameters to allow the user to
> decide. More specifically, I want to be able to change the exclude and
> include MCA to enable TCP over local addresses.
> George
> On Sep 21, 2016 4:32 PM, "Gilles Gouaillardet"
> <> wrote:
>> George,
>> Is proc locality already set at that time ?
>> If yes, then we could keep a hard coded test so 127.x.y.z address (and
>> IPv6 equivalent) are never used (even if included or not excluded) for inter
>> node communication
>> Cheers,
>> Gilles
>> "Jeff Squyres (jsquyres)" <> wrote:
>> >On Sep 21, 2016, at 10:56 AM, George Bosilca <> wrote:
>> >>
>> >> No, because 127.x.x.x is by default part of the exclude, so it will
>> >> never get into the modex. The problem today, is that even if you manually
>> >> remove it from the exclude and add it to the include, it will not work,
>> >> because of the hardcoded checks. Once we remove those checks, things will
>> >> work the way we expect, interfaces are removed because they don't match 
>> >> the
>> >> provided addresses.
>> >
>> >Gotcha.
>> >
>> >> I would have agreed with you if the current code was doing a better
>> >> decision of what is local and what not. But it is not, it simply remove 
>> >> all
>> >> 127.x.x.x interfaces (opal/util/net.c:222). Thus, the only thing the 
>> >> current
>> >> code does, is preventing a power-user from using the loopback (despite 
>> >> being
>> >> explicitly enabled via the corresponding MCA parameters).
>> >
>> >Fair enough.
>> >
>> >Should we have a keyword that can be used in the
>> > btl_tcp_if_include/exclude (e.g., "local") that removes all local-only
>> > interfaces?  I.E., all 127.x.x.x/8 interfaces *and* all local-only
>> > interfaces (e.g., bridging interfaces to local VMs and the like)?
>> >
>> >We could then replace the default "" value in
>> > btl_tcp_if_exclude with this token, and therefore actually exclude the
>> > VM-only interfaces (which have caused some users problems in the past).
>> >
>> >--
>> >Jeff Squyres
>> >
>> >For corporate legal information go to:
>> >
>> >
>> >_______________________________________________
>> >devel mailing list
>> >
>> >
>> _______________________________________________
>> devel mailing list
> _______________________________________________
> devel mailing list
devel mailing list

Reply via email to