As I said earlier, modifying these legacy apps is not a desirable solution. The 
coll/sync component was developed specifically to alleviate these problems in 
an acceptable manner, albeit not optimal. Performance, in this case, is 
secondary to just getting the app to run.


> On Aug 20, 2016, at 7:38 PM, Gilles Gouaillardet 
> <gilles.gouaillar...@gmail.com> wrote:
> 
> Ralph,
> 
> in the meantime, and if not done already, your user can simply redefine 
> MPI_Bcast in the app.
> 
> 
> int MPI_Bcast(void *buffer, int count, MPI_Datatype type, int root, MPI_Comm 
> comm) {
>     PMPI_Barrier(comm);
>     return PMPI_Bcast(buffer, count, datatype, root, comm);
> }
> 
> the root causes are
> - no control flow in Open MPI for eager messages (as explained by George)
> and
> - some processes are much slower than others.
> 
> so even if Open MPI provides a fix or workaround, the end user will be left 
> with some important load imbalance, which is far from being optimal from 
> his/her performance point of view.
> 
> 
> Cheers,
> 
> Gilles
> 
> On Sunday, August 21, 2016, r...@open-mpi.org <mailto:r...@open-mpi.org> 
> <r...@open-mpi.org <mailto:r...@open-mpi.org>> wrote:
> I don’t disagree with anything you said - however, this problem has been 
> reported in our library for more than a decade (goes way back into the old 
> Trac days), and has yet to be resolved. Meantime, we have a user that is 
> “down” and needs a solution. Whether it is a “cheap shot” or not is 
> irrelevant to them.
> 
> I’ll leave it to you deeper MPI wonks to solve the problem correctly :-) When 
> you have done so, I will happily remove the coll/sync component and tell the 
> user “all has been resolved”.
> 
> 
>> RROn Aug 20, 2016, at 11:44 AM, George Bosilca <bosi...@icl.utk.edu 
>> <javascript:_e(%7B%7D,'cvml','bosi...@icl.utk.edu');>> wrote:
>> 
>> Ralph,
>> 
>> Bringing back the coll/sync is a cheap shot at hiding a real issue behind a 
>> smoke curtain. As Nathan described in his email, Open MPI lacks of control 
>> flow on eager messages is the real culprit here, and the loop around any 
>> one-to-many collective (bcast and scatter*) was only helping to exacerbate 
>> the issue. However, doing a loop around a small MPI_Send will also end on a 
>> memory exhaustion issue, one that would not be easily circumvented by adding 
>> synchronizations deep inside the library.
>> 
>>   George.
>> 
>> 
>> On Sat, Aug 20, 2016 at 12:30 AM, r...@open-mpi.org 
>> <javascript:_e(%7B%7D,'cvml','r...@open-mpi.org');> <r...@open-mpi.org 
>> <javascript:_e(%7B%7D,'cvml','r...@open-mpi.org');>> wrote:
>> I can not provide the user report as it is a proprietary problem. However, 
>> it consists of a large loop of calls to MPI_Bcast that crashes due to 
>> unexpected messages. We have been looking at instituting flow control, but 
>> that has way too widespread an impact. The coll/sync component would be a 
>> simple solution.
>> 
>> I honestly don’t believe the issue I was resolving was due to a bug - it was 
>> a simple problem of one proc running slow and creating an overload of 
>> unexpected messages that eventually consumed too much memory. Rather, I 
>> think you solved a different problem - by the time you arrived at LANL, the 
>> app I was working with had already modified their code to no longer create 
>> the problem (essentially refactoring the algorithm to avoid the massive loop 
>> over allreduce).
>> 
>> I have no issue supporting it as it takes near-zero effort to maintain, and 
>> this is a fairly common problem with legacy codes that don’t want to 
>> refactor their algorithms.
>> 
>> 
>> > On Aug 19, 2016, at 8:48 PM, Nathan Hjelm <hje...@me.com 
>> > <javascript:_e(%7B%7D,'cvml','hje...@me.com');>> wrote:
>> >
>> >> On Aug 19, 2016, at 4:24 PM, r...@open-mpi.org 
>> >> <javascript:_e(%7B%7D,'cvml','r...@open-mpi.org');> wrote:
>> >>
>> >> Hi folks
>> >>
>> >> I had a question arise regarding a problem being seen by an OMPI user - 
>> >> has to do with the old bugaboo I originally dealt with back in my LANL 
>> >> days. The problem is with an app that repeatedly hammers on a collective, 
>> >> and gets overwhelmed by unexpected messages when one of the procs falls 
>> >> behind.
>> >
>> > I did some investigation on roadrunner several years ago and determined 
>> > that the user code issue coll/sync was attempting to fix was due to a bug 
>> > in ob1/cksum (really can’t remember). coll/sync was simply masking a 
>> > live-lock problem. I committed a workaround for the bug in r26575 
>> > (https://github.com/open-mpi/ompi/commit/59e529cf1dfe986e40d14ec4d2a2e5ef0cea5e35
>> >  
>> > <https://github.com/open-mpi/ompi/commit/59e529cf1dfe986e40d14ec4d2a2e5ef0cea5e35>)
>> >  and tested it with the user code. After this change the user code ran 
>> > fine without coll/sync. Since lanl no longer had any users of coll/sync we 
>> > stopped supporting it.
>> >
>> >> I solved this back then by introducing the “sync” component in 
>> >> ompi/mca/coll, which injected a barrier operation every N collectives. 
>> >> You could even “tune” it by doing the injection for only specific 
>> >> collectives.
>> >>
>> >> However, I can no longer find that component in the code base - I find it 
>> >> in the 1.6 series, but someone removed it during the 1.7 series.
>> >>
>> >> Can someone tell me why this was done??? Is there any reason not to bring 
>> >> it back? It solves a very real, not uncommon, problem.
>> >> Ralph
>> >
>> > This was discussed during one (or several) tel-cons years ago. We agreed 
>> > to kill it and bring it back if there is 1) a use case, and 2) someone is 
>> > willing to support it. See 
>> > https://github.com/open-mpi/ompi/commit/5451ee46bd6fcdec002b333474dec919475d2d62
>> >  
>> > <https://github.com/open-mpi/ompi/commit/5451ee46bd6fcdec002b333474dec919475d2d62>
>> >  .
>> >
>> > Can you link the user email?
>> >
>> > -Nathan
>> > _______________________________________________
>> > devel mailing list
>> > devel@lists.open-mpi.org 
>> > <javascript:_e(%7B%7D,'cvml','devel@lists.open-mpi.org');>
>> > https://rfd.newmexicoconsortium.org/mailman/listinfo/devel 
>> > <https://rfd.newmexicoconsortium.org/mailman/listinfo/devel>
>> 
>> _______________________________________________
>> devel mailing list
>> devel@lists.open-mpi.org 
>> <javascript:_e(%7B%7D,'cvml','devel@lists.open-mpi.org');>
>> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel 
>> <https://rfd.newmexicoconsortium.org/mailman/listinfo/devel>
>> _______________________________________________
>> devel mailing list
>> devel@lists.open-mpi.org 
>> <javascript:_e(%7B%7D,'cvml','devel@lists.open-mpi.org');>
>> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel 
>> <https://rfd.newmexicoconsortium.org/mailman/listinfo/devel>
> _______________________________________________
> devel mailing list
> devel@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel

_______________________________________________
devel mailing list
devel@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/devel

Reply via email to