I don’t disagree with anything you said - however, this problem has been 
reported in our library for more than a decade (goes way back into the old Trac 
days), and has yet to be resolved. Meantime, we have a user that is “down” and 
needs a solution. Whether it is a “cheap shot” or not is irrelevant to them.

I’ll leave it to you deeper MPI wonks to solve the problem correctly :-) When 
you have done so, I will happily remove the coll/sync component and tell the 
user “all has been resolved”.


> On Aug 20, 2016, at 11:44 AM, George Bosilca <bosi...@icl.utk.edu> wrote:
> 
> Ralph,
> 
> Bringing back the coll/sync is a cheap shot at hiding a real issue behind a 
> smoke curtain. As Nathan described in his email, Open MPI lacks of control 
> flow on eager messages is the real culprit here, and the loop around any 
> one-to-many collective (bcast and scatter*) was only helping to exacerbate 
> the issue. However, doing a loop around a small MPI_Send will also end on a 
> memory exhaustion issue, one that would not be easily circumvented by adding 
> synchronizations deep inside the library.
> 
>   George.
> 
> 
> On Sat, Aug 20, 2016 at 12:30 AM, r...@open-mpi.org 
> <mailto:r...@open-mpi.org> <r...@open-mpi.org <mailto:r...@open-mpi.org>> 
> wrote:
> I can not provide the user report as it is a proprietary problem. However, it 
> consists of a large loop of calls to MPI_Bcast that crashes due to unexpected 
> messages. We have been looking at instituting flow control, but that has way 
> too widespread an impact. The coll/sync component would be a simple solution.
> 
> I honestly don’t believe the issue I was resolving was due to a bug - it was 
> a simple problem of one proc running slow and creating an overload of 
> unexpected messages that eventually consumed too much memory. Rather, I think 
> you solved a different problem - by the time you arrived at LANL, the app I 
> was working with had already modified their code to no longer create the 
> problem (essentially refactoring the algorithm to avoid the massive loop over 
> allreduce).
> 
> I have no issue supporting it as it takes near-zero effort to maintain, and 
> this is a fairly common problem with legacy codes that don’t want to refactor 
> their algorithms.
> 
> 
> > On Aug 19, 2016, at 8:48 PM, Nathan Hjelm <hje...@me.com 
> > <mailto:hje...@me.com>> wrote:
> >
> >> On Aug 19, 2016, at 4:24 PM, r...@open-mpi.org <mailto:r...@open-mpi.org> 
> >> wrote:
> >>
> >> Hi folks
> >>
> >> I had a question arise regarding a problem being seen by an OMPI user - 
> >> has to do with the old bugaboo I originally dealt with back in my LANL 
> >> days. The problem is with an app that repeatedly hammers on a collective, 
> >> and gets overwhelmed by unexpected messages when one of the procs falls 
> >> behind.
> >
> > I did some investigation on roadrunner several years ago and determined 
> > that the user code issue coll/sync was attempting to fix was due to a bug 
> > in ob1/cksum (really can’t remember). coll/sync was simply masking a 
> > live-lock problem. I committed a workaround for the bug in r26575 
> > (https://github.com/open-mpi/ompi/commit/59e529cf1dfe986e40d14ec4d2a2e5ef0cea5e35
> >  
> > <https://github.com/open-mpi/ompi/commit/59e529cf1dfe986e40d14ec4d2a2e5ef0cea5e35>)
> >  and tested it with the user code. After this change the user code ran fine 
> > without coll/sync. Since lanl no longer had any users of coll/sync we 
> > stopped supporting it.
> >
> >> I solved this back then by introducing the “sync” component in 
> >> ompi/mca/coll, which injected a barrier operation every N collectives. You 
> >> could even “tune” it by doing the injection for only specific collectives.
> >>
> >> However, I can no longer find that component in the code base - I find it 
> >> in the 1.6 series, but someone removed it during the 1.7 series.
> >>
> >> Can someone tell me why this was done??? Is there any reason not to bring 
> >> it back? It solves a very real, not uncommon, problem.
> >> Ralph
> >
> > This was discussed during one (or several) tel-cons years ago. We agreed to 
> > kill it and bring it back if there is 1) a use case, and 2) someone is 
> > willing to support it. See 
> > https://github.com/open-mpi/ompi/commit/5451ee46bd6fcdec002b333474dec919475d2d62
> >  
> > <https://github.com/open-mpi/ompi/commit/5451ee46bd6fcdec002b333474dec919475d2d62>
> >  .
> >
> > Can you link the user email?
> >
> > -Nathan
> > _______________________________________________
> > devel mailing list
> > devel@lists.open-mpi.org <mailto:devel@lists.open-mpi.org>
> > https://rfd.newmexicoconsortium.org/mailman/listinfo/devel 
> > <https://rfd.newmexicoconsortium.org/mailman/listinfo/devel>
> 
> _______________________________________________
> devel mailing list
> devel@lists.open-mpi.org <mailto:devel@lists.open-mpi.org>
> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel 
> <https://rfd.newmexicoconsortium.org/mailman/listinfo/devel>
> _______________________________________________
> devel mailing list
> devel@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel

_______________________________________________
devel mailing list
devel@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/devel

Reply via email to