Tobias,

MPI_Cancel is a tricky beast, and should be handled with extreme care. From my 
perspective, your problem is not related to a specific implementation, but to 
you usage of the MPI_Cancel. 

You state the MPI_Wait is not supposed to hang but I don't see anything in the 
MPI standard allowing you to state this? If you are referring to the first 
paragraph on 3.8 (regarding MPI_Cancel), then I have to disagree with you. You 
have to pay attention to the wording of the standard to see the trick.

> Either the cancellation succeeds, or the communication succeeds, but not both.

This is the definition of a successful cancellation, that is the base of every 
other action that happen on the request. As the MPI_Cancel is only defined as a 
local operation, an MPI library the send the matching info for the persistent 
request in MPI_Start, will have a hard time canceling the request.

Now, imagine a run where the receiver manage to cancel his request as it has 
not been matched (and this can be done locally). As the sender sent the 
matching information on MPI_Start, when it reach the MPI_Cancel it cannot 
cancel the request locally, so the cancel will fail. The sender will therefore 
be blocked on the MPI_Wait, which the receiver will happily wait on the 
MPI_Finalize.

  george.

On Feb 7, 2011, at 04:54 , Tobias Hilbrich wrote:

> Hi all,
> 
> I am with the ZIH developers working on VampirTrace and just discovered a 
> possibly erroneous behavior of OpenMPI (v1.4.3). I am canceling an active 
> persistent request created with MPI_Ssend_init, in a successive MPI_Wait call 
> the process hangs, even though according to the MPI standard this should 
> never happen. 
> 
> The pesudo code is as follows:
>       if (rank == 0)
>               MPI_Ssend_init (&buf, 1, MPI_INT, 1, 666, MPI_COMM_WORLD, &r);
>       if (rank == 1)
>               MPI_Recv_init (&buf, 1, MPI_INT, 0, 666, MPI_COMM_WORLD, &r);
>       
>       //Start
>       MPI_Start (&r);
>       
>       //Cancel
>       MPI_Cancel (&r);
>       
>       //Wait
>       MPI_Wait (&r, &status);
>       
>       //Free
>       MPI_Request_free (&r);
> 
> The full (minimal reproducer) source code along with a dump of ompi_info is 
> attached.
> 
> Either I am missing some passage of the standard mentioning that it is 
> forbidden to cancel an synchronous send or there appears to be an error in 
> OpenMPI's implementation. If it is already fixed, sorry for the spam.
> (Note: changing the Ssend to Send or Bsend removes the hang)
> 
> -Tobias
>  
> <ssend_init_cancel.c>
> <ssend_init_cancel.ompi_info>
> 
> --
> Dipl.-Inf. Tobias Hilbrich
> Wissenschaftlicher Mitarbeiter
> 
> Technische Universitaet Dresden
> Zentrum fuer Informationsdienste und Hochleistungsrechnen (ZIH)
> (Center for Information Services and High Performance Computing (ZIH))
> Interdisziplinäre Anwenderunterstützung und Koordination
> (Interdisciplinary Application Development and Coordination)
> 01062 Dresden
> Tel.: +49 (351) 463-32041
> Fax: +49 (351) 463-37773
> E-Mail: tobias.hilbr...@zih.tu-dresden.de
> 
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel


Reply via email to