RFC applied via 93fa94f9.

On Fri, Sep 23, 2016 at 7:13 AM, George Bosilca <bosi...@icl.utk.edu> wrote:

> It turns out the OMPI behavior today was divergent from what is written in
> the README. We already explicitly state that
>
>   - If specified, the "btl_tcp_if_exclude" parameter must include the
>     loopback device ("lo" on many Linux platforms), or Open MPI will
>     not be able to route MPI messages using the TCP BTL.  For example:
>     "mpirun --mca btl_tcp_if_exclude lo,eth1 ..."
>
> So, with this patch we are now README compliant !
>
>   George.
>
>
>
> On Fri, Sep 23, 2016 at 7:03 AM, Gilles Gouaillardet <
> gilles.gouaillar...@gmail.com> wrote:
>
>> George,
>>
>> OK then,
>> I recommend we explicitly state in the README that loopback interface can
>> no more be omitted from btl_tcp_if_exclude when running on multiple nodes
>>
>> Cheers,
>>
>> Gilles
>>
>>
>> On Thursday, September 22, 2016, George Bosilca <bosi...@icl.utk.edu>
>> wrote:
>>
>>> Thanks for clarifying, I now understand what your objection/suggestion
>>> was. We all misconfigured OMPI at least once, but that allowed us to learn
>>> how to do it right.
>>>
>>> Instead of adding extra protections for corner-cases, maybe we should
>>> fix our exclusivity flag so that the scenario you describe would not happen.
>>>
>>>   George.
>>>
>>> PS: "btl_tcp_if_exclude = ^ib0" qualifies as a honest mistake. I
>>> wouldn't dare proposing a new MCA param to prevent this ...
>>>
>>>
>>> On Wed, Sep 21, 2016 at 10:54 PM, Gilles Gouaillardet <
>>> gilles.gouaillar...@gmail.com> wrote:
>>>
>>>> ok, i was not clear
>>>>
>>>> by "let's consider the case where "lo" is *not* excluded via the
>>>> btl_tcp_if_exclude MCA param" i really meant
>>>> "let's consider the case where the value of the btl_tcp_if_exclude MCA
>>>> param has been forced to a list of network/interfaces that do not
>>>> contain any reference (e.g. name nor subnet) to the loopback
>>>> interface"
>>>> /* in a previous example, i did mpirun --mca btl_tcp_if_exclude ^ib0 */
>>>>
>>>> my concern is that openmpi-mca-params.conf contains
>>>> btl_tcp_if_exclude = ^ib0
>>>>
>>>> then hiccups will start when Open MPI is updated, and i expect some
>>>> complains.
>>>> of course we can reply, doc should have been read and advices
>>>> followed, so one cannot complain just because he has been lucky so
>>>> far.
>>>> or we can do things a bit differently so we do not run into this case
>>>>
>>>> /* if btl/self is excluded, the app will not start and it is trivial
>>>> to append to the error message a note asking to ensure btl/self was
>>>> not excluded.
>>>> in this case, i do not think we have a mechanism to issue a warning
>>>> message (e.g. "ensure lo is excluded") when hiccups occur. */
>>>>
>>>> Cheers,
>>>>
>>>> Gilles
>>>>
>>>> On Thu, Sep 22, 2016 at 9:54 AM, George Bosilca <bosi...@icl.utk.edu>
>>>> wrote:
>>>> > On Wednesday, September 21, 2016, Gilles Gouaillardet
>>>> > <gilles.gouaillar...@gmail.com> wrote:
>>>> >>
>>>> >> George,
>>>> >>
>>>> >> let's consider the case where "lo" is *not* excluded via the
>>>> >> btl_tcp_if_exclude MCA param
>>>> >> (if i understand correctly, the following is also true if "lo" is
>>>> >> included via the btl_tcp_if_include MCA param)
>>>> >>
>>>> >> currently, and because of/thanks to the test that is done "deep
>>>> inside"
>>>> >> 1) on a disconnected laptop, mpirun --mca btl tcp,self ... fails with
>>>> >> 2 tasks or more because tasks cannot reach each other
>>>> >> 2) on a (connected) cluster, "lo" is never used and mpirun --mca btl
>>>> >> tcp,self ... does not hang when tasks are running on two nodes or
>>>> more
>>>> >>
>>>> >> with your proposal :
>>>> >> 3) on a disconnected laptop, mpirun --mca btl tcp,self ... works with
>>>> >> any number of taks, because "lo" is used by btl/tcp
>>>> >> 4) on a (connected) cluster, "lo" is used and mpirun --mca btl
>>>> >> tcp,self ... will very likely hang when tasks are running on two
>>>> nodes
>>>> >> or more
>>>> >>
>>>> >> am i right so far ?
>>>> >
>>>> >
>>>> > No, you are missing the fact that thanks to our if_exclude (which
>>>> contains
>>>> > by default 127.0.0.0/24) we will never use lo (not even with my
>>>> patch).
>>>> > Thus, local interfaces will remain out of reach for most users, with
>>>> the
>>>> > exception of those that manually force the inclusion of lo via
>>>> if_include.
>>>> >
>>>> > On a cluster where a user explicitly enable lo, there will be some
>>>> hiccups
>>>> > during startup. However, as Paul states we explicitly discourage
>>>> people of
>>>> > doing that in the README. Second, the connection over lo will
>>>> eventually
>>>> > timeout, and lo it will be dropped and all pending communications
>>>> will be
>>>> > redirected through another TCP interface.
>>>> >
>>>> > Cheers,
>>>> > George.
>>>> >
>>>> >
>>>> >>
>>>> >> my concern is 4)
>>>> >> as Paul pointed out, we can consider this is not an issue since this
>>>> >> is a user/admin mistake, and we do not care whether this is an honest
>>>> >> one or not. that being said, this is not very friendly since
>>>> something
>>>> >> that is working fine today will (likely) start hanging when your
>>>> patch
>>>> >> is merged.
>>>> >>
>>>> >> my suggestion differs since it is basically 2) and 3), which can be
>>>> >> seen as the best of both worlds
>>>> >>
>>>> >> makes sense ?
>>>> >>
>>>> >> as a side note, there were some discussions about automatically
>>>> adding
>>>> >> the self btl,
>>>> >> and even offering a user friendly alternative to --mca btl xxx
>>>> >> (for example --networks shm,infiniband. today Open MPI does not
>>>> >> provide any alternative to btl/self. also infiniband can be used via
>>>> >> btl/openib, mtl/mxm or libfabric, which makes it painful to
>>>> >> blacklist). i cannot remember the outcome of the discussion (if any).
>>>> >>
>>>> >> Cheers,
>>>> >>
>>>> >> Gilles
>>>> >>
>>>> >> On Thu, Sep 22, 2016 at 4:57 AM, George Bosilca <bosi...@icl.utk.edu
>>>> >
>>>> >> wrote:
>>>> >> > Gilles,
>>>> >> >
>>>> >> > I don't understand how your proposal is any different than what we
>>>> have
>>>> >> > today. I quote "If [locality flag is set], then we could keep a
>>>> hard
>>>> >> > coded
>>>> >> > test so 127.x.y.z address (and IPv6 equivalent) are never used
>>>> (even if
>>>> >> > included or not excluded) for inter node communication". We
>>>> already have
>>>> >> > a
>>>> >> > hardcoded test to prevent 127.x.y.z addresses from being used. In
>>>> fact
>>>> >> > we
>>>> >> > have 2 tests, one because this address range is part of our default
>>>> >> > if_exclude, and then a second test (that only does something
>>>> useful in
>>>> >> > case
>>>> >> > you manually added lo* to if_include) deep inside the IP matching
>>>> logic.
>>>> >> >
>>>> >> >   George.
>>>> >> >
>>>> >> >
>>>> >> > On Wed, Sep 21, 2016 at 12:36 PM, Gilles Gouaillardet
>>>> >> > <gilles.gouaillar...@gmail.com> wrote:
>>>> >> >>
>>>> >> >> George,
>>>> >> >>
>>>> >> >> i got that, and i consider my suggestion as an improvement to your
>>>> >> >> proposal.
>>>> >> >>
>>>> >> >> if i want to exclude ib0, i might want to
>>>> >> >> mpirun --mca btl_tcp_if_exclude ib0 ...
>>>> >> >>
>>>> >> >> to me, this is an honest mistake, but with your proposal, i would
>>>> be
>>>> >> >> screwed when
>>>> >> >> running on more than one node because i should have
>>>> >> >> mpirun --mca btl_tcp_if_exclude ib0,lo ...
>>>> >> >>
>>>> >> >> and if this parameter is set by the admin in the system-wide
>>>> config,
>>>> >> >> then this configuration must be adapted by the admin, and that
>>>> could
>>>> >> >> generate some confusion.
>>>> >> >>
>>>> >> >> my suggestion simply adds a "safety net" to your proposal
>>>> >> >>
>>>> >> >> for the sake of completion, i do not really care whether there
>>>> should
>>>> >> >> be a safety net or not if localhost is explicitly included via
>>>> the the
>>>> >> >> btl_tcp_if_include MCA parameter
>>>> >> >>
>>>> >> >> a different and safe/friendly proposal is to add a new
>>>> >> >> btl_tcp_if_exclude_localhost MCA param, which is true by default,
>>>> so
>>>> >> >> you would simply force it to false if you want to MPI_Comm_spawn
>>>> or
>>>> >> >> use the tcp btl on your disconnected laptop.
>>>> >> >>
>>>> >> >> as a side note, this reminds me that the openib/btl is used by
>>>> default
>>>> >> >> for intra node communication between two tasks from different
>>>> jobs (sm
>>>> >> >> nor vader cannot be used yet, and btl/openib has a higher
>>>> exclusivity
>>>> >> >> than btl/tcp). my first impression is that i am not so comfortable
>>>> >> >> with that, and we could add yet an other MCA parameter so
>>>> btl/openib
>>>> >> >> disqualifies itself for intra node communications.
>>>> >> >>
>>>> >> >>
>>>> >> >> Cheers,
>>>> >> >>
>>>> >> >> Gilles
>>>> >> >>
>>>> >> >> On Thu, Sep 22, 2016 at 12:56 AM, George Bosilca <
>>>> bosi...@icl.utk.edu>
>>>> >> >> wrote:
>>>> >> >> > My proposal is not about adding new ways of deciding what is
>>>> local
>>>> >> >> > and
>>>> >> >> > what
>>>> >> >> > not. I proposed to use the corresponding MCA parameters to
>>>> allow the
>>>> >> >> > user to
>>>> >> >> > decide. More specifically, I want to be able to change the
>>>> exclude
>>>> >> >> > and
>>>> >> >> > include MCA to enable TCP over local addresses.
>>>> >> >> >
>>>> >> >> > George
>>>> >> >> >
>>>> >> >> >
>>>> >> >> > On Sep 21, 2016 4:32 PM, "Gilles Gouaillardet"
>>>> >> >> > <gilles.gouaillar...@gmail.com> wrote:
>>>> >> >> >>
>>>> >> >> >> George,
>>>> >> >> >>
>>>> >> >> >> Is proc locality already set at that time ?
>>>> >> >> >>
>>>> >> >> >> If yes, then we could keep a hard coded test so 127.x.y.z
>>>> address
>>>> >> >> >> (and
>>>> >> >> >> IPv6 equivalent) are never used (even if included or not
>>>> excluded)
>>>> >> >> >> for
>>>> >> >> >> inter
>>>> >> >> >> node communication
>>>> >> >> >>
>>>> >> >> >> Cheers,
>>>> >> >> >>
>>>> >> >> >> Gilles
>>>> >> >> >>
>>>> >> >> >> "Jeff Squyres (jsquyres)" <jsquy...@cisco.com> wrote:
>>>> >> >> >> >On Sep 21, 2016, at 10:56 AM, George Bosilca <
>>>> bosi...@icl.utk.edu>
>>>> >> >> >> > wrote:
>>>> >> >> >> >>
>>>> >> >> >> >> No, because 127.x.x.x is by default part of the exclude, so
>>>> it
>>>> >> >> >> >> will
>>>> >> >> >> >> never get into the modex. The problem today, is that even
>>>> if you
>>>> >> >> >> >> manually
>>>> >> >> >> >> remove it from the exclude and add it to the include, it
>>>> will not
>>>> >> >> >> >> work,
>>>> >> >> >> >> because of the hardcoded checks. Once we remove those
>>>> checks,
>>>> >> >> >> >> things
>>>> >> >> >> >> will
>>>> >> >> >> >> work the way we expect, interfaces are removed because they
>>>> don't
>>>> >> >> >> >> match the
>>>> >> >> >> >> provided addresses.
>>>> >> >> >> >
>>>> >> >> >> >Gotcha.
>>>> >> >> >> >
>>>> >> >> >> >> I would have agreed with you if the current code was doing a
>>>> >> >> >> >> better
>>>> >> >> >> >> decision of what is local and what not. But it is not, it
>>>> simply
>>>> >> >> >> >> remove all
>>>> >> >> >> >> 127.x.x.x interfaces (opal/util/net.c:222). Thus, the only
>>>> thing
>>>> >> >> >> >> the
>>>> >> >> >> >> current
>>>> >> >> >> >> code does, is preventing a power-user from using the
>>>> loopback
>>>> >> >> >> >> (despite being
>>>> >> >> >> >> explicitly enabled via the corresponding MCA parameters).
>>>> >> >> >> >
>>>> >> >> >> >Fair enough.
>>>> >> >> >> >
>>>> >> >> >> >Should we have a keyword that can be used in the
>>>> >> >> >> > btl_tcp_if_include/exclude (e.g., "local") that removes all
>>>> >> >> >> > local-only
>>>> >> >> >> > interfaces?  I.E., all 127.x.x.x/8 interfaces *and* all
>>>> local-only
>>>> >> >> >> > interfaces (e.g., bridging interfaces to local VMs and the
>>>> like)?
>>>> >> >> >> >
>>>> >> >> >> >We could then replace the default "127.0.0.0/8" value in
>>>> >> >> >> > btl_tcp_if_exclude with this token, and therefore actually
>>>> exclude
>>>> >> >> >> > the
>>>> >> >> >> > VM-only interfaces (which have caused some users problems in
>>>> the
>>>> >> >> >> > past).
>>>> >> >> >> >
>>>> >> >> >> >--
>>>> >> >> >> >Jeff Squyres
>>>> >> >> >> >jsquy...@cisco.com
>>>> >> >> >> >For corporate legal information go to:
>>>> >> >> >> > http://www.cisco.com/web/about/doing_business/legal/cri/
>>>> >> >> >> >
>>>> >> >> >> >_______________________________________________
>>>> >> >> >> >devel mailing list
>>>> >> >> >> >devel@lists.open-mpi.org
>>>> >> >> >> >https://rfd.newmexicoconsortium.org/mailman/listinfo/devel
>>>> >> >> >> _______________________________________________
>>>> >> >> >> devel mailing list
>>>> >> >> >> devel@lists.open-mpi.org
>>>> >> >> >> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel
>>>> >> >> >
>>>> >> >> >
>>>> >> >> > _______________________________________________
>>>> >> >> > devel mailing list
>>>> >> >> > devel@lists.open-mpi.org
>>>> >> >> > https://rfd.newmexicoconsortium.org/mailman/listinfo/devel
>>>> >> >> _______________________________________________
>>>> >> >> devel mailing list
>>>> >> >> devel@lists.open-mpi.org
>>>> >> >> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel
>>>> >> >
>>>> >> >
>>>> >> >
>>>> >> > _______________________________________________
>>>> >> > devel mailing list
>>>> >> > devel@lists.open-mpi.org
>>>> >> > https://rfd.newmexicoconsortium.org/mailman/listinfo/devel
>>>> >> _______________________________________________
>>>> >> devel mailing list
>>>> >> devel@lists.open-mpi.org
>>>> >> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel
>>>> >
>>>> >
>>>> > _______________________________________________
>>>> > devel mailing list
>>>> > devel@lists.open-mpi.org
>>>> > https://rfd.newmexicoconsortium.org/mailman/listinfo/devel
>>>> _______________________________________________
>>>> devel mailing list
>>>> devel@lists.open-mpi.org
>>>> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel
>>>>
>>>
>>>
>> _______________________________________________
>> devel mailing list
>> devel@lists.open-mpi.org
>> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel
>>
>
>
_______________________________________________
devel mailing list
devel@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/devel

Reply via email to