Gilles, I don't understand how your proposal is any different than what we have today. I quote "If [locality flag is set], then we could keep a hard coded test so 127.x.y.z address (and IPv6 equivalent) are never used (even if included or not excluded) for inter node communication". We already have a hardcoded test to prevent 127.x.y.z addresses from being used. In fact we have 2 tests, one because this address range is part of our default if_exclude, and then a second test (that only does something useful in case you manually added lo* to if_include) deep inside the IP matching logic.
George. On Wed, Sep 21, 2016 at 12:36 PM, Gilles Gouaillardet < gilles.gouaillar...@gmail.com> wrote: > George, > > i got that, and i consider my suggestion as an improvement to your > proposal. > > if i want to exclude ib0, i might want to > mpirun --mca btl_tcp_if_exclude ib0 ... > > to me, this is an honest mistake, but with your proposal, i would be > screwed when > running on more than one node because i should have > mpirun --mca btl_tcp_if_exclude ib0,lo ... > > and if this parameter is set by the admin in the system-wide config, > then this configuration must be adapted by the admin, and that could > generate some confusion. > > my suggestion simply adds a "safety net" to your proposal > > for the sake of completion, i do not really care whether there should > be a safety net or not if localhost is explicitly included via the the > btl_tcp_if_include MCA parameter > > a different and safe/friendly proposal is to add a new > btl_tcp_if_exclude_localhost MCA param, which is true by default, so > you would simply force it to false if you want to MPI_Comm_spawn or > use the tcp btl on your disconnected laptop. > > as a side note, this reminds me that the openib/btl is used by default > for intra node communication between two tasks from different jobs (sm > nor vader cannot be used yet, and btl/openib has a higher exclusivity > than btl/tcp). my first impression is that i am not so comfortable > with that, and we could add yet an other MCA parameter so btl/openib > disqualifies itself for intra node communications. > > > Cheers, > > Gilles > > On Thu, Sep 22, 2016 at 12:56 AM, George Bosilca <bosi...@icl.utk.edu> > wrote: > > My proposal is not about adding new ways of deciding what is local and > what > > not. I proposed to use the corresponding MCA parameters to allow the > user to > > decide. More specifically, I want to be able to change the exclude and > > include MCA to enable TCP over local addresses. > > > > George > > > > > > On Sep 21, 2016 4:32 PM, "Gilles Gouaillardet" > > <gilles.gouaillar...@gmail.com> wrote: > >> > >> George, > >> > >> Is proc locality already set at that time ? > >> > >> If yes, then we could keep a hard coded test so 127.x.y.z address (and > >> IPv6 equivalent) are never used (even if included or not excluded) for > inter > >> node communication > >> > >> Cheers, > >> > >> Gilles > >> > >> "Jeff Squyres (jsquyres)" <jsquy...@cisco.com> wrote: > >> >On Sep 21, 2016, at 10:56 AM, George Bosilca <bosi...@icl.utk.edu> > wrote: > >> >> > >> >> No, because 127.x.x.x is by default part of the exclude, so it will > >> >> never get into the modex. The problem today, is that even if you > manually > >> >> remove it from the exclude and add it to the include, it will not > work, > >> >> because of the hardcoded checks. Once we remove those checks, things > will > >> >> work the way we expect, interfaces are removed because they don't > match the > >> >> provided addresses. > >> > > >> >Gotcha. > >> > > >> >> I would have agreed with you if the current code was doing a better > >> >> decision of what is local and what not. But it is not, it simply > remove all > >> >> 127.x.x.x interfaces (opal/util/net.c:222). Thus, the only thing the > current > >> >> code does, is preventing a power-user from using the loopback > (despite being > >> >> explicitly enabled via the corresponding MCA parameters). > >> > > >> >Fair enough. > >> > > >> >Should we have a keyword that can be used in the > >> > btl_tcp_if_include/exclude (e.g., "local") that removes all local-only > >> > interfaces? I.E., all 127.x.x.x/8 interfaces *and* all local-only > >> > interfaces (e.g., bridging interfaces to local VMs and the like)? > >> > > >> >We could then replace the default "127.0.0.0/8" value in > >> > btl_tcp_if_exclude with this token, and therefore actually exclude the > >> > VM-only interfaces (which have caused some users problems in the > past). > >> > > >> >-- > >> >Jeff Squyres > >> >jsquy...@cisco.com > >> >For corporate legal information go to: > >> > http://www.cisco.com/web/about/doing_business/legal/cri/ > >> > > >> >_______________________________________________ > >> >devel mailing list > >> >devel@lists.open-mpi.org > >> >https://rfd.newmexicoconsortium.org/mailman/listinfo/devel > >> _______________________________________________ > >> devel mailing list > >> devel@lists.open-mpi.org > >> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel > > > > > > _______________________________________________ > > devel mailing list > > devel@lists.open-mpi.org > > https://rfd.newmexicoconsortium.org/mailman/listinfo/devel > _______________________________________________ > devel mailing list > devel@lists.open-mpi.org > https://rfd.newmexicoconsortium.org/mailman/listinfo/devel >
_______________________________________________ devel mailing list devel@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/devel