I forgot to mention that you should test the cancelled status of your request 
with MPI_TEST_CANCELLED after the MPI_Cancel, as the MPI_Cancel doesn't return 
an error.

  george.

On Feb 7, 2011, at 14:52 , George Bosilca wrote:

> Tobias,
> 
> MPI_Cancel is a tricky beast, and should be handled with extreme care. From 
> my perspective, your problem is not related to a specific implementation, but 
> to you usage of the MPI_Cancel. 
> 
> You state the MPI_Wait is not supposed to hang but I don't see anything in 
> the MPI standard allowing you to state this? If you are referring to the 
> first paragraph on 3.8 (regarding MPI_Cancel), then I have to disagree with 
> you. You have to pay attention to the wording of the standard to see the 
> trick.
> 
>> Either the cancellation succeeds, or the communication succeeds, but not 
>> both.
> 
> This is the definition of a successful cancellation, that is the base of 
> every other action that happen on the request. As the MPI_Cancel is only 
> defined as a local operation, an MPI library the send the matching info for 
> the persistent request in MPI_Start, will have a hard time canceling the 
> request.
> 
> Now, imagine a run where the receiver manage to cancel his request as it has 
> not been matched (and this can be done locally). As the sender sent the 
> matching information on MPI_Start, when it reach the MPI_Cancel it cannot 
> cancel the request locally, so the cancel will fail. The sender will 
> therefore be blocked on the MPI_Wait, which the receiver will happily wait on 
> the MPI_Finalize.
> 
>  george.
> 
> On Feb 7, 2011, at 04:54 , Tobias Hilbrich wrote:
> 
>> Hi all,
>> 
>> I am with the ZIH developers working on VampirTrace and just discovered a 
>> possibly erroneous behavior of OpenMPI (v1.4.3). I am canceling an active 
>> persistent request created with MPI_Ssend_init, in a successive MPI_Wait 
>> call the process hangs, even though according to the MPI standard this 
>> should never happen. 
>> 
>> The pesudo code is as follows:
>>      if (rank == 0)
>>              MPI_Ssend_init (&buf, 1, MPI_INT, 1, 666, MPI_COMM_WORLD, &r);
>>      if (rank == 1)
>>              MPI_Recv_init (&buf, 1, MPI_INT, 0, 666, MPI_COMM_WORLD, &r);
>>      
>>      //Start
>>      MPI_Start (&r);
>>      
>>      //Cancel
>>      MPI_Cancel (&r);
>>      
>>      //Wait
>>      MPI_Wait (&r, &status);
>>      
>>      //Free
>>      MPI_Request_free (&r);
>> 
>> The full (minimal reproducer) source code along with a dump of ompi_info is 
>> attached.
>> 
>> Either I am missing some passage of the standard mentioning that it is 
>> forbidden to cancel an synchronous send or there appears to be an error in 
>> OpenMPI's implementation. If it is already fixed, sorry for the spam.
>> (Note: changing the Ssend to Send or Bsend removes the hang)
>> 
>> -Tobias
>> 
>> <ssend_init_cancel.c>
>> <ssend_init_cancel.ompi_info>
>> 
>> --
>> Dipl.-Inf. Tobias Hilbrich
>> Wissenschaftlicher Mitarbeiter
>> 
>> Technische Universitaet Dresden
>> Zentrum fuer Informationsdienste und Hochleistungsrechnen (ZIH)
>> (Center for Information Services and High Performance Computing (ZIH))
>> Interdisziplinäre Anwenderunterstützung und Koordination
>> (Interdisciplinary Application Development and Coordination)
>> 01062 Dresden
>> Tel.: +49 (351) 463-32041
>> Fax: +49 (351) 463-37773
>> E-Mail: tobias.hilbr...@zih.tu-dresden.de
>> 
>> _______________________________________________
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 
> 
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel


Reply via email to