I see. But in branch v1.8, in 31869, Ralph reverted the commit which moved del_procs after the barrier: "Revert r31851 until we can resolve how to close these leaks without causing the usnic BTL to fail during disconnect of intercommunicators Refs #4643" Also, we need an rte barrier after del_procs - because otherwise rankA could call pml_finalize() before rankB finishes disconnecting from rankA.
I think the order in finalize should be like this: 1. mpi_barrier(world) 2. del_procs() 3. rte_barrier() 4. pml_finalize() -----Original Message----- From: Nathan Hjelm [mailto:hje...@lanl.gov] Sent: Monday, July 21, 2014 8:01 PM To: Open MPI Developers Cc: Yossi Etigin Subject: Re: [OMPI devel] barrier before calling del_procs I should add that it is an rte barrier and not an MPI barrier for technical reasons. -Nathan On Mon, Jul 21, 2014 at 09:42:53AM -0700, Ralph Castain wrote: > We already have an rte barrier before del procs > > Sent from my iPhone > On Jul 21, 2014, at 8:21 AM, Yossi Etigin <yos...@mellanox.com> wrote: > > Hi, > > > > We get occasional hangs with MTL/MXM during finalize, because a global > synchronization is needed before calling del_procs. > > e.g rank A may call del_procs() and disconnect from rank B, while rank B > is still working. > > What do you think about adding an MPI barrier on COMM_WORLD before > calling del_procs()? > > > _______________________________________________ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2014/07/15204.php