I forgot to mention that you should test the cancelled status of your request with MPI_TEST_CANCELLED after the MPI_Cancel, as the MPI_Cancel doesn't return an error.
george. On Feb 7, 2011, at 14:52 , George Bosilca wrote: > Tobias, > > MPI_Cancel is a tricky beast, and should be handled with extreme care. From > my perspective, your problem is not related to a specific implementation, but > to you usage of the MPI_Cancel. > > You state the MPI_Wait is not supposed to hang but I don't see anything in > the MPI standard allowing you to state this? If you are referring to the > first paragraph on 3.8 (regarding MPI_Cancel), then I have to disagree with > you. You have to pay attention to the wording of the standard to see the > trick. > >> Either the cancellation succeeds, or the communication succeeds, but not >> both. > > This is the definition of a successful cancellation, that is the base of > every other action that happen on the request. As the MPI_Cancel is only > defined as a local operation, an MPI library the send the matching info for > the persistent request in MPI_Start, will have a hard time canceling the > request. > > Now, imagine a run where the receiver manage to cancel his request as it has > not been matched (and this can be done locally). As the sender sent the > matching information on MPI_Start, when it reach the MPI_Cancel it cannot > cancel the request locally, so the cancel will fail. The sender will > therefore be blocked on the MPI_Wait, which the receiver will happily wait on > the MPI_Finalize. > > george. > > On Feb 7, 2011, at 04:54 , Tobias Hilbrich wrote: > >> Hi all, >> >> I am with the ZIH developers working on VampirTrace and just discovered a >> possibly erroneous behavior of OpenMPI (v1.4.3). I am canceling an active >> persistent request created with MPI_Ssend_init, in a successive MPI_Wait >> call the process hangs, even though according to the MPI standard this >> should never happen. >> >> The pesudo code is as follows: >> if (rank == 0) >> MPI_Ssend_init (&buf, 1, MPI_INT, 1, 666, MPI_COMM_WORLD, &r); >> if (rank == 1) >> MPI_Recv_init (&buf, 1, MPI_INT, 0, 666, MPI_COMM_WORLD, &r); >> >> //Start >> MPI_Start (&r); >> >> //Cancel >> MPI_Cancel (&r); >> >> //Wait >> MPI_Wait (&r, &status); >> >> //Free >> MPI_Request_free (&r); >> >> The full (minimal reproducer) source code along with a dump of ompi_info is >> attached. >> >> Either I am missing some passage of the standard mentioning that it is >> forbidden to cancel an synchronous send or there appears to be an error in >> OpenMPI's implementation. If it is already fixed, sorry for the spam. >> (Note: changing the Ssend to Send or Bsend removes the hang) >> >> -Tobias >> >> <ssend_init_cancel.c> >> <ssend_init_cancel.ompi_info> >> >> -- >> Dipl.-Inf. Tobias Hilbrich >> Wissenschaftlicher Mitarbeiter >> >> Technische Universitaet Dresden >> Zentrum fuer Informationsdienste und Hochleistungsrechnen (ZIH) >> (Center for Information Services and High Performance Computing (ZIH)) >> Interdisziplinäre Anwenderunterstützung und Koordination >> (Interdisciplinary Application Development and Coordination) >> 01062 Dresden >> Tel.: +49 (351) 463-32041 >> Fax: +49 (351) 463-37773 >> E-Mail: tobias.hilbr...@zih.tu-dresden.de >> >> _______________________________________________ >> devel mailing list >> de...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/devel > > > _______________________________________________ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel