Thank you, Nathan. This makes more sense now. On Tue, Apr 16, 2019 at 6:48 AM Nathan Hjelm <hje...@me.com> wrote:
> What Ralph said. You just blow memory on a queue that is not recovered in > the current implementation. > > Also, moving to Allreduce will resolve the issue as now every call is > effectively also a barrier. I have found with some benchmarks and > collective implementations it can be faster than reduce anyway. That is why > it might be worth trying. > > -Nathan > > > On Apr 15, 2019, at 2:33 PM, Saliya Ekanayake <esal...@gmail.com> wrote: > > > > Thank you, Nathan. Could you elaborate a bit on what happens internally? > From your answer it seems, the program will still produce the correct > output at the end but it'll use more resources. > > > > On Mon, Apr 15, 2019 at 9:00 AM Nathan Hjelm via devel < > devel@lists.open-mpi.org> wrote: > > If you do that it may run out of resources and deadlock or crash. I > recommend either 1) adding a barrier every 100 iterations, 2) using > allreduce, or 3) enable coll/sync (which essentially does 1). Honestly, 2 > is probably the easiest option and depending on how large you run may not > be any slower than 1 or 3. > > > > -Nathan > > > > > On Apr 15, 2019, at 9:53 AM, Saliya Ekanayake <esal...@gmail.com> > wrote: > > > > > > Hi Devs, > > > > > > When doing MPI_Reduce in a loop (collecting on Rank 0), is it the > correct understanding that ranks other than root (0 in this case) will pass > the collective as soon as their data is written to MPI buffers without > waiting for all of them to be received at the root? > > > > > > If that's the case then what would happen (semantically) if we execute > MPI_Reduce in a loop without a barrier allowing non-root ranks to hit the > collective multiple times while the root will be processing an earlier > reduce? For example, the root can be in the first reduce invocation, while > another rank is in the second the reduce invocation. > > > > > > Thank you, > > > Saliya > > > > > > -- > > > Saliya Ekanayake, Ph.D > > > Postdoctoral Scholar > > > Performance and Algorithms Research (PAR) Group > > > Lawrence Berkeley National Laboratory > > > Phone: 510-486-5772 > > > > > > _______________________________________________ > > > devel mailing list > > > devel@lists.open-mpi.org > > > https://lists.open-mpi.org/mailman/listinfo/devel > > > > _______________________________________________ > > devel mailing list > > devel@lists.open-mpi.org > > https://lists.open-mpi.org/mailman/listinfo/devel > > > > > > -- > > Saliya Ekanayake, Ph.D > > Postdoctoral Scholar > > Performance and Algorithms Research (PAR) Group > > Lawrence Berkeley National Laboratory > > Phone: 510-486-5772 > > > > -- Saliya Ekanayake, Ph.D Postdoctoral Scholar Performance and Algorithms Research (PAR) Group Lawrence Berkeley National Laboratory Phone: 510-486-5772
_______________________________________________ devel mailing list devel@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/devel