RFC applied via 93fa94f9.
On Fri, Sep 23, 2016 at 7:13 AM, George Bosilca <bosi...@icl.utk.edu> wrote: > It turns out the OMPI behavior today was divergent from what is written in > the README. We already explicitly state that > > - If specified, the "btl_tcp_if_exclude" parameter must include the > loopback device ("lo" on many Linux platforms), or Open MPI will > not be able to route MPI messages using the TCP BTL. For example: > "mpirun --mca btl_tcp_if_exclude lo,eth1 ..." > > So, with this patch we are now README compliant ! > > George. > > > > On Fri, Sep 23, 2016 at 7:03 AM, Gilles Gouaillardet < > gilles.gouaillar...@gmail.com> wrote: > >> George, >> >> OK then, >> I recommend we explicitly state in the README that loopback interface can >> no more be omitted from btl_tcp_if_exclude when running on multiple nodes >> >> Cheers, >> >> Gilles >> >> >> On Thursday, September 22, 2016, George Bosilca <bosi...@icl.utk.edu> >> wrote: >> >>> Thanks for clarifying, I now understand what your objection/suggestion >>> was. We all misconfigured OMPI at least once, but that allowed us to learn >>> how to do it right. >>> >>> Instead of adding extra protections for corner-cases, maybe we should >>> fix our exclusivity flag so that the scenario you describe would not happen. >>> >>> George. >>> >>> PS: "btl_tcp_if_exclude = ^ib0" qualifies as a honest mistake. I >>> wouldn't dare proposing a new MCA param to prevent this ... >>> >>> >>> On Wed, Sep 21, 2016 at 10:54 PM, Gilles Gouaillardet < >>> gilles.gouaillar...@gmail.com> wrote: >>> >>>> ok, i was not clear >>>> >>>> by "let's consider the case where "lo" is *not* excluded via the >>>> btl_tcp_if_exclude MCA param" i really meant >>>> "let's consider the case where the value of the btl_tcp_if_exclude MCA >>>> param has been forced to a list of network/interfaces that do not >>>> contain any reference (e.g. name nor subnet) to the loopback >>>> interface" >>>> /* in a previous example, i did mpirun --mca btl_tcp_if_exclude ^ib0 */ >>>> >>>> my concern is that openmpi-mca-params.conf contains >>>> btl_tcp_if_exclude = ^ib0 >>>> >>>> then hiccups will start when Open MPI is updated, and i expect some >>>> complains. >>>> of course we can reply, doc should have been read and advices >>>> followed, so one cannot complain just because he has been lucky so >>>> far. >>>> or we can do things a bit differently so we do not run into this case >>>> >>>> /* if btl/self is excluded, the app will not start and it is trivial >>>> to append to the error message a note asking to ensure btl/self was >>>> not excluded. >>>> in this case, i do not think we have a mechanism to issue a warning >>>> message (e.g. "ensure lo is excluded") when hiccups occur. */ >>>> >>>> Cheers, >>>> >>>> Gilles >>>> >>>> On Thu, Sep 22, 2016 at 9:54 AM, George Bosilca <bosi...@icl.utk.edu> >>>> wrote: >>>> > On Wednesday, September 21, 2016, Gilles Gouaillardet >>>> > <gilles.gouaillar...@gmail.com> wrote: >>>> >> >>>> >> George, >>>> >> >>>> >> let's consider the case where "lo" is *not* excluded via the >>>> >> btl_tcp_if_exclude MCA param >>>> >> (if i understand correctly, the following is also true if "lo" is >>>> >> included via the btl_tcp_if_include MCA param) >>>> >> >>>> >> currently, and because of/thanks to the test that is done "deep >>>> inside" >>>> >> 1) on a disconnected laptop, mpirun --mca btl tcp,self ... fails with >>>> >> 2 tasks or more because tasks cannot reach each other >>>> >> 2) on a (connected) cluster, "lo" is never used and mpirun --mca btl >>>> >> tcp,self ... does not hang when tasks are running on two nodes or >>>> more >>>> >> >>>> >> with your proposal : >>>> >> 3) on a disconnected laptop, mpirun --mca btl tcp,self ... works with >>>> >> any number of taks, because "lo" is used by btl/tcp >>>> >> 4) on a (connected) cluster, "lo" is used and mpirun --mca btl >>>> >> tcp,self ... will very likely hang when tasks are running on two >>>> nodes >>>> >> or more >>>> >> >>>> >> am i right so far ? >>>> > >>>> > >>>> > No, you are missing the fact that thanks to our if_exclude (which >>>> contains >>>> > by default 127.0.0.0/24) we will never use lo (not even with my >>>> patch). >>>> > Thus, local interfaces will remain out of reach for most users, with >>>> the >>>> > exception of those that manually force the inclusion of lo via >>>> if_include. >>>> > >>>> > On a cluster where a user explicitly enable lo, there will be some >>>> hiccups >>>> > during startup. However, as Paul states we explicitly discourage >>>> people of >>>> > doing that in the README. Second, the connection over lo will >>>> eventually >>>> > timeout, and lo it will be dropped and all pending communications >>>> will be >>>> > redirected through another TCP interface. >>>> > >>>> > Cheers, >>>> > George. >>>> > >>>> > >>>> >> >>>> >> my concern is 4) >>>> >> as Paul pointed out, we can consider this is not an issue since this >>>> >> is a user/admin mistake, and we do not care whether this is an honest >>>> >> one or not. that being said, this is not very friendly since >>>> something >>>> >> that is working fine today will (likely) start hanging when your >>>> patch >>>> >> is merged. >>>> >> >>>> >> my suggestion differs since it is basically 2) and 3), which can be >>>> >> seen as the best of both worlds >>>> >> >>>> >> makes sense ? >>>> >> >>>> >> as a side note, there were some discussions about automatically >>>> adding >>>> >> the self btl, >>>> >> and even offering a user friendly alternative to --mca btl xxx >>>> >> (for example --networks shm,infiniband. today Open MPI does not >>>> >> provide any alternative to btl/self. also infiniband can be used via >>>> >> btl/openib, mtl/mxm or libfabric, which makes it painful to >>>> >> blacklist). i cannot remember the outcome of the discussion (if any). >>>> >> >>>> >> Cheers, >>>> >> >>>> >> Gilles >>>> >> >>>> >> On Thu, Sep 22, 2016 at 4:57 AM, George Bosilca <bosi...@icl.utk.edu >>>> > >>>> >> wrote: >>>> >> > Gilles, >>>> >> > >>>> >> > I don't understand how your proposal is any different than what we >>>> have >>>> >> > today. I quote "If [locality flag is set], then we could keep a >>>> hard >>>> >> > coded >>>> >> > test so 127.x.y.z address (and IPv6 equivalent) are never used >>>> (even if >>>> >> > included or not excluded) for inter node communication". We >>>> already have >>>> >> > a >>>> >> > hardcoded test to prevent 127.x.y.z addresses from being used. In >>>> fact >>>> >> > we >>>> >> > have 2 tests, one because this address range is part of our default >>>> >> > if_exclude, and then a second test (that only does something >>>> useful in >>>> >> > case >>>> >> > you manually added lo* to if_include) deep inside the IP matching >>>> logic. >>>> >> > >>>> >> > George. >>>> >> > >>>> >> > >>>> >> > On Wed, Sep 21, 2016 at 12:36 PM, Gilles Gouaillardet >>>> >> > <gilles.gouaillar...@gmail.com> wrote: >>>> >> >> >>>> >> >> George, >>>> >> >> >>>> >> >> i got that, and i consider my suggestion as an improvement to your >>>> >> >> proposal. >>>> >> >> >>>> >> >> if i want to exclude ib0, i might want to >>>> >> >> mpirun --mca btl_tcp_if_exclude ib0 ... >>>> >> >> >>>> >> >> to me, this is an honest mistake, but with your proposal, i would >>>> be >>>> >> >> screwed when >>>> >> >> running on more than one node because i should have >>>> >> >> mpirun --mca btl_tcp_if_exclude ib0,lo ... >>>> >> >> >>>> >> >> and if this parameter is set by the admin in the system-wide >>>> config, >>>> >> >> then this configuration must be adapted by the admin, and that >>>> could >>>> >> >> generate some confusion. >>>> >> >> >>>> >> >> my suggestion simply adds a "safety net" to your proposal >>>> >> >> >>>> >> >> for the sake of completion, i do not really care whether there >>>> should >>>> >> >> be a safety net or not if localhost is explicitly included via >>>> the the >>>> >> >> btl_tcp_if_include MCA parameter >>>> >> >> >>>> >> >> a different and safe/friendly proposal is to add a new >>>> >> >> btl_tcp_if_exclude_localhost MCA param, which is true by default, >>>> so >>>> >> >> you would simply force it to false if you want to MPI_Comm_spawn >>>> or >>>> >> >> use the tcp btl on your disconnected laptop. >>>> >> >> >>>> >> >> as a side note, this reminds me that the openib/btl is used by >>>> default >>>> >> >> for intra node communication between two tasks from different >>>> jobs (sm >>>> >> >> nor vader cannot be used yet, and btl/openib has a higher >>>> exclusivity >>>> >> >> than btl/tcp). my first impression is that i am not so comfortable >>>> >> >> with that, and we could add yet an other MCA parameter so >>>> btl/openib >>>> >> >> disqualifies itself for intra node communications. >>>> >> >> >>>> >> >> >>>> >> >> Cheers, >>>> >> >> >>>> >> >> Gilles >>>> >> >> >>>> >> >> On Thu, Sep 22, 2016 at 12:56 AM, George Bosilca < >>>> bosi...@icl.utk.edu> >>>> >> >> wrote: >>>> >> >> > My proposal is not about adding new ways of deciding what is >>>> local >>>> >> >> > and >>>> >> >> > what >>>> >> >> > not. I proposed to use the corresponding MCA parameters to >>>> allow the >>>> >> >> > user to >>>> >> >> > decide. More specifically, I want to be able to change the >>>> exclude >>>> >> >> > and >>>> >> >> > include MCA to enable TCP over local addresses. >>>> >> >> > >>>> >> >> > George >>>> >> >> > >>>> >> >> > >>>> >> >> > On Sep 21, 2016 4:32 PM, "Gilles Gouaillardet" >>>> >> >> > <gilles.gouaillar...@gmail.com> wrote: >>>> >> >> >> >>>> >> >> >> George, >>>> >> >> >> >>>> >> >> >> Is proc locality already set at that time ? >>>> >> >> >> >>>> >> >> >> If yes, then we could keep a hard coded test so 127.x.y.z >>>> address >>>> >> >> >> (and >>>> >> >> >> IPv6 equivalent) are never used (even if included or not >>>> excluded) >>>> >> >> >> for >>>> >> >> >> inter >>>> >> >> >> node communication >>>> >> >> >> >>>> >> >> >> Cheers, >>>> >> >> >> >>>> >> >> >> Gilles >>>> >> >> >> >>>> >> >> >> "Jeff Squyres (jsquyres)" <jsquy...@cisco.com> wrote: >>>> >> >> >> >On Sep 21, 2016, at 10:56 AM, George Bosilca < >>>> bosi...@icl.utk.edu> >>>> >> >> >> > wrote: >>>> >> >> >> >> >>>> >> >> >> >> No, because 127.x.x.x is by default part of the exclude, so >>>> it >>>> >> >> >> >> will >>>> >> >> >> >> never get into the modex. The problem today, is that even >>>> if you >>>> >> >> >> >> manually >>>> >> >> >> >> remove it from the exclude and add it to the include, it >>>> will not >>>> >> >> >> >> work, >>>> >> >> >> >> because of the hardcoded checks. Once we remove those >>>> checks, >>>> >> >> >> >> things >>>> >> >> >> >> will >>>> >> >> >> >> work the way we expect, interfaces are removed because they >>>> don't >>>> >> >> >> >> match the >>>> >> >> >> >> provided addresses. >>>> >> >> >> > >>>> >> >> >> >Gotcha. >>>> >> >> >> > >>>> >> >> >> >> I would have agreed with you if the current code was doing a >>>> >> >> >> >> better >>>> >> >> >> >> decision of what is local and what not. But it is not, it >>>> simply >>>> >> >> >> >> remove all >>>> >> >> >> >> 127.x.x.x interfaces (opal/util/net.c:222). Thus, the only >>>> thing >>>> >> >> >> >> the >>>> >> >> >> >> current >>>> >> >> >> >> code does, is preventing a power-user from using the >>>> loopback >>>> >> >> >> >> (despite being >>>> >> >> >> >> explicitly enabled via the corresponding MCA parameters). >>>> >> >> >> > >>>> >> >> >> >Fair enough. >>>> >> >> >> > >>>> >> >> >> >Should we have a keyword that can be used in the >>>> >> >> >> > btl_tcp_if_include/exclude (e.g., "local") that removes all >>>> >> >> >> > local-only >>>> >> >> >> > interfaces? I.E., all 127.x.x.x/8 interfaces *and* all >>>> local-only >>>> >> >> >> > interfaces (e.g., bridging interfaces to local VMs and the >>>> like)? >>>> >> >> >> > >>>> >> >> >> >We could then replace the default "127.0.0.0/8" value in >>>> >> >> >> > btl_tcp_if_exclude with this token, and therefore actually >>>> exclude >>>> >> >> >> > the >>>> >> >> >> > VM-only interfaces (which have caused some users problems in >>>> the >>>> >> >> >> > past). >>>> >> >> >> > >>>> >> >> >> >-- >>>> >> >> >> >Jeff Squyres >>>> >> >> >> >jsquy...@cisco.com >>>> >> >> >> >For corporate legal information go to: >>>> >> >> >> > http://www.cisco.com/web/about/doing_business/legal/cri/ >>>> >> >> >> > >>>> >> >> >> >_______________________________________________ >>>> >> >> >> >devel mailing list >>>> >> >> >> >devel@lists.open-mpi.org >>>> >> >> >> >https://rfd.newmexicoconsortium.org/mailman/listinfo/devel >>>> >> >> >> _______________________________________________ >>>> >> >> >> devel mailing list >>>> >> >> >> devel@lists.open-mpi.org >>>> >> >> >> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel >>>> >> >> > >>>> >> >> > >>>> >> >> > _______________________________________________ >>>> >> >> > devel mailing list >>>> >> >> > devel@lists.open-mpi.org >>>> >> >> > https://rfd.newmexicoconsortium.org/mailman/listinfo/devel >>>> >> >> _______________________________________________ >>>> >> >> devel mailing list >>>> >> >> devel@lists.open-mpi.org >>>> >> >> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel >>>> >> > >>>> >> > >>>> >> > >>>> >> > _______________________________________________ >>>> >> > devel mailing list >>>> >> > devel@lists.open-mpi.org >>>> >> > https://rfd.newmexicoconsortium.org/mailman/listinfo/devel >>>> >> _______________________________________________ >>>> >> devel mailing list >>>> >> devel@lists.open-mpi.org >>>> >> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel >>>> > >>>> > >>>> > _______________________________________________ >>>> > devel mailing list >>>> > devel@lists.open-mpi.org >>>> > https://rfd.newmexicoconsortium.org/mailman/listinfo/devel >>>> _______________________________________________ >>>> devel mailing list >>>> devel@lists.open-mpi.org >>>> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel >>>> >>> >>> >> _______________________________________________ >> devel mailing list >> devel@lists.open-mpi.org >> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel >> > >
_______________________________________________ devel mailing list devel@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/devel