George, i got that, and i consider my suggestion as an improvement to your proposal.
if i want to exclude ib0, i might want to mpirun --mca btl_tcp_if_exclude ib0 ... to me, this is an honest mistake, but with your proposal, i would be screwed when running on more than one node because i should have mpirun --mca btl_tcp_if_exclude ib0,lo ... and if this parameter is set by the admin in the system-wide config, then this configuration must be adapted by the admin, and that could generate some confusion. my suggestion simply adds a "safety net" to your proposal for the sake of completion, i do not really care whether there should be a safety net or not if localhost is explicitly included via the the btl_tcp_if_include MCA parameter a different and safe/friendly proposal is to add a new btl_tcp_if_exclude_localhost MCA param, which is true by default, so you would simply force it to false if you want to MPI_Comm_spawn or use the tcp btl on your disconnected laptop. as a side note, this reminds me that the openib/btl is used by default for intra node communication between two tasks from different jobs (sm nor vader cannot be used yet, and btl/openib has a higher exclusivity than btl/tcp). my first impression is that i am not so comfortable with that, and we could add yet an other MCA parameter so btl/openib disqualifies itself for intra node communications. Cheers, Gilles On Thu, Sep 22, 2016 at 12:56 AM, George Bosilca <bosi...@icl.utk.edu> wrote: > My proposal is not about adding new ways of deciding what is local and what > not. I proposed to use the corresponding MCA parameters to allow the user to > decide. More specifically, I want to be able to change the exclude and > include MCA to enable TCP over local addresses. > > George > > > On Sep 21, 2016 4:32 PM, "Gilles Gouaillardet" > <gilles.gouaillar...@gmail.com> wrote: >> >> George, >> >> Is proc locality already set at that time ? >> >> If yes, then we could keep a hard coded test so 127.x.y.z address (and >> IPv6 equivalent) are never used (even if included or not excluded) for inter >> node communication >> >> Cheers, >> >> Gilles >> >> "Jeff Squyres (jsquyres)" <jsquy...@cisco.com> wrote: >> >On Sep 21, 2016, at 10:56 AM, George Bosilca <bosi...@icl.utk.edu> wrote: >> >> >> >> No, because 127.x.x.x is by default part of the exclude, so it will >> >> never get into the modex. The problem today, is that even if you manually >> >> remove it from the exclude and add it to the include, it will not work, >> >> because of the hardcoded checks. Once we remove those checks, things will >> >> work the way we expect, interfaces are removed because they don't match >> >> the >> >> provided addresses. >> > >> >Gotcha. >> > >> >> I would have agreed with you if the current code was doing a better >> >> decision of what is local and what not. But it is not, it simply remove >> >> all >> >> 127.x.x.x interfaces (opal/util/net.c:222). Thus, the only thing the >> >> current >> >> code does, is preventing a power-user from using the loopback (despite >> >> being >> >> explicitly enabled via the corresponding MCA parameters). >> > >> >Fair enough. >> > >> >Should we have a keyword that can be used in the >> > btl_tcp_if_include/exclude (e.g., "local") that removes all local-only >> > interfaces? I.E., all 127.x.x.x/8 interfaces *and* all local-only >> > interfaces (e.g., bridging interfaces to local VMs and the like)? >> > >> >We could then replace the default "127.0.0.0/8" value in >> > btl_tcp_if_exclude with this token, and therefore actually exclude the >> > VM-only interfaces (which have caused some users problems in the past). >> > >> >-- >> >Jeff Squyres >> >jsquy...@cisco.com >> >For corporate legal information go to: >> > http://www.cisco.com/web/about/doing_business/legal/cri/ >> > >> >_______________________________________________ >> >devel mailing list >> >devel@lists.open-mpi.org >> >https://rfd.newmexicoconsortium.org/mailman/listinfo/devel >> _______________________________________________ >> devel mailing list >> devel@lists.open-mpi.org >> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel > > > _______________________________________________ > devel mailing list > devel@lists.open-mpi.org > https://rfd.newmexicoconsortium.org/mailman/listinfo/devel _______________________________________________ devel mailing list devel@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/devel