Tobias, MPI_Cancel is a tricky beast, and should be handled with extreme care. From my perspective, your problem is not related to a specific implementation, but to you usage of the MPI_Cancel.
You state the MPI_Wait is not supposed to hang but I don't see anything in the MPI standard allowing you to state this? If you are referring to the first paragraph on 3.8 (regarding MPI_Cancel), then I have to disagree with you. You have to pay attention to the wording of the standard to see the trick. > Either the cancellation succeeds, or the communication succeeds, but not both. This is the definition of a successful cancellation, that is the base of every other action that happen on the request. As the MPI_Cancel is only defined as a local operation, an MPI library the send the matching info for the persistent request in MPI_Start, will have a hard time canceling the request. Now, imagine a run where the receiver manage to cancel his request as it has not been matched (and this can be done locally). As the sender sent the matching information on MPI_Start, when it reach the MPI_Cancel it cannot cancel the request locally, so the cancel will fail. The sender will therefore be blocked on the MPI_Wait, which the receiver will happily wait on the MPI_Finalize. george. On Feb 7, 2011, at 04:54 , Tobias Hilbrich wrote: > Hi all, > > I am with the ZIH developers working on VampirTrace and just discovered a > possibly erroneous behavior of OpenMPI (v1.4.3). I am canceling an active > persistent request created with MPI_Ssend_init, in a successive MPI_Wait call > the process hangs, even though according to the MPI standard this should > never happen. > > The pesudo code is as follows: > if (rank == 0) > MPI_Ssend_init (&buf, 1, MPI_INT, 1, 666, MPI_COMM_WORLD, &r); > if (rank == 1) > MPI_Recv_init (&buf, 1, MPI_INT, 0, 666, MPI_COMM_WORLD, &r); > > //Start > MPI_Start (&r); > > //Cancel > MPI_Cancel (&r); > > //Wait > MPI_Wait (&r, &status); > > //Free > MPI_Request_free (&r); > > The full (minimal reproducer) source code along with a dump of ompi_info is > attached. > > Either I am missing some passage of the standard mentioning that it is > forbidden to cancel an synchronous send or there appears to be an error in > OpenMPI's implementation. If it is already fixed, sorry for the spam. > (Note: changing the Ssend to Send or Bsend removes the hang) > > -Tobias > > <ssend_init_cancel.c> > <ssend_init_cancel.ompi_info> > > -- > Dipl.-Inf. Tobias Hilbrich > Wissenschaftlicher Mitarbeiter > > Technische Universitaet Dresden > Zentrum fuer Informationsdienste und Hochleistungsrechnen (ZIH) > (Center for Information Services and High Performance Computing (ZIH)) > Interdisziplinäre Anwenderunterstützung und Koordination > (Interdisciplinary Application Development and Coordination) > 01062 Dresden > Tel.: +49 (351) 463-32041 > Fax: +49 (351) 463-37773 > E-Mail: tobias.hilbr...@zih.tu-dresden.de > > _______________________________________________ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel