What Ralph said. You just blow memory on a queue that is not recovered in the 
current implementation.

Also, moving to Allreduce will resolve the issue as now every call is 
effectively also a barrier. I have found with some benchmarks and collective 
implementations it can be faster than reduce anyway. That is why it might be 
worth trying.

-Nathan

> On Apr 15, 2019, at 2:33 PM, Saliya Ekanayake <esal...@gmail.com> wrote:
> 
> Thank you, Nathan. Could you elaborate a bit on what happens internally? From 
> your answer it seems, the program will still produce the correct output at 
> the end but it'll use more resources. 
> 
> On Mon, Apr 15, 2019 at 9:00 AM Nathan Hjelm via devel 
> <devel@lists.open-mpi.org> wrote:
> If you do that it may run out of resources and deadlock or crash. I recommend 
> either 1) adding a barrier every 100 iterations, 2) using allreduce, or 3) 
> enable coll/sync (which essentially does 1). Honestly, 2 is probably the 
> easiest option and depending on how large you run may not be any slower than 
> 1 or 3.
> 
> -Nathan
> 
> > On Apr 15, 2019, at 9:53 AM, Saliya Ekanayake <esal...@gmail.com> wrote:
> > 
> > Hi Devs,
> > 
> > When doing MPI_Reduce in a loop (collecting on Rank 0), is it the correct 
> > understanding that ranks other than root (0 in this case) will pass the 
> > collective as soon as their data is written to MPI buffers without waiting 
> > for all of them to be received at the root?
> > 
> > If that's the case then what would happen (semantically) if we execute 
> > MPI_Reduce in a loop without a barrier allowing non-root ranks to hit the 
> > collective multiple times while the root will be processing an earlier 
> > reduce? For example, the root can be in the first reduce invocation, while 
> > another rank is in the second the reduce invocation.
> > 
> > Thank you,
> > Saliya
> > 
> > -- 
> > Saliya Ekanayake, Ph.D
> > Postdoctoral Scholar
> > Performance and Algorithms Research (PAR) Group
> > Lawrence Berkeley National Laboratory
> > Phone: 510-486-5772
> > 
> > _______________________________________________
> > devel mailing list
> > devel@lists.open-mpi.org
> > https://lists.open-mpi.org/mailman/listinfo/devel
> 
> _______________________________________________
> devel mailing list
> devel@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/devel
> 
> 
> -- 
> Saliya Ekanayake, Ph.D
> Postdoctoral Scholar
> Performance and Algorithms Research (PAR) Group
> Lawrence Berkeley National Laboratory
> Phone: 510-486-5772
> 

_______________________________________________
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel

Reply via email to