This approach is a completely legit way of destroying pending operation on a 
module tied to a communicator.

A communicator is not destroyed when MPI_Comm_free is called but when all the 
pending operations using the communicator are done (this is the only moment 
when one can safely go and release resources associated with a communicator). 
Thus, as long as you have a pending operation generated by a module tied to the 
communicator, the refcount of the communicator cannot reach zero, and the 
communicator destruction will never be triggered. Having an attribute allows 
for an early destruction function, one that is called as soon as MPI_Comm_free 
is called, allowing for the release of pending operations and thus nicely 
playing with the Open MPI refcount system.

  George.

On Jan 9, 2014, at 18:05 , Joshua Ladd <josh...@mellanox.com> wrote:

> See inline...
> 
> -----Original Message-----
> From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Jeff Squyres 
> (jsquyres)
> Sent: Thursday, January 09, 2014 11:53 AM
> To: Open MPI Developers
> Cc: Devendar Bureddy; valentin (valentin.pet...@itseez.com) 
> (valentin.pet...@itseez.com); Mike Dubman
> Subject: Re: [OMPI devel] hcoll destruction via MPI attribute
> 
> On Jan 9, 2014, at 11:00 AM, Joshua Ladd <josh...@mellanox.com> wrote:
> 
>> Hcoll uses the PML as an "OOB" to bootstrap itself. When a communicator is 
>> destroyed, by the time we destroy the hcoll module, the communicator context 
>> is no longer valid and any pending operations that rely on its existence 
>> will fail. In particular, we have a non-blocking synchronization barrier 
>> that may be in progress when the communicator is destroyed.
> 
> Can you explain this a little more?  Do you mean you have a pending 
> MPI_Ibarrier running on that communicator?  (i.e., the ibarrier has started 
> but not completed)  Or you have some started-but-not-completed 
> MPI_Isends/MPI_Irecvs?
> 
> (using the PML/coll equivalents of these of course -- not the top-level MPI_* 
> foo functions)
> 
> Or are you saying that you need the destruction of the hcoll module on a 
> given communicator to be synchronous between all processes in that 
> communicator?
> 
> [Josh] We have a recursive doubling algorithm in progress implemented with 
> PML send/recvs, more accurately , with "RTE_isend/RTE_irecv" functions, 
> which, in the case of OMPI are PML calls.
> 
>> Registering the delete callback allows us to finish these operations because 
>> the context is still valid inside of this callback. The commented out code 
>> is the "prototype" protocol that attempted to handle this scenario in an 
>> entirely different (and more complex) way. It is not needed now. We don't 
>> want to introduce solutions that are OMPI specific, because we need to be 
>> able to integrate hcoll into other runtimes. We considered approaching the 
>> community about changing the comm destroy flow in OMPI to keep the context 
>> alive long enough to complete our synchronization barriers, but then the 
>> solution is tied to a particular MPI
> 
> I'm not quite sure I understand -- the hcoll module (where this code is 
> located) is completely OMPI-specific.  I thought that libhcoll was your 
> independent-of-MPI-implementations portion of this code...?
> 
> [Josh] The hcoll module is the integration layer. HCOLL is completely 
> standalone. When you create a new communicator, you create a new hcoll module 
> whih in turn creates a new "hcoll context".  We have defined what we call the 
> RTE interface, which is an API that runtimes need to implement in order to 
> use HCOLL. Basically, the runtime needs to provide HCOLL with a handle to a 
> non-blocking send and receive, implement some callbacks, and pass in a group 
> handle. HCOLL is completely MPI agnostic. 
> 
> -- 
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to: 
> http://www.cisco.com/web/about/doing_business/legal/cri/
> 
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

Reply via email to