>From the MPI standard perspective MPI_Cancel doesn't have to succeed, it can also gracefully fail. However, the PSM MTL diverges from the MPI standard and if a request cannot be canceled an error is returned. Here is a patch to fix this issue.
diff --git a/ompi/mca/mtl/psm/mtl_psm_cancel.c b/ompi/mca/mtl/psm/mtl_psm_cancel index 6da3386..277c761 100644 --- a/ompi/mca/mtl/psm/mtl_psm_cancel.c +++ b/ompi/mca/mtl/psm/mtl_psm_cancel.c @@ -37,10 +37,8 @@ int ompi_mtl_psm_cancel(struct mca_mtl_base_module_t* mtl, if(PSM_OK == err) { mtl_request->ompi_req->req_status._cancelled = true; mtl_psm_request->super.completion_callback(&mtl_psm_request->super); - return OMPI_SUCCESS; - } else { - return OMPI_ERROR; } + return OMPI_SUCCESS; } else if(PSM_MQ_INCOMPLETE == err) { return OMPI_SUCCESS; } George. On Thu, Jan 15, 2015 at 1:30 PM, Adrian Reber <adr...@lisas.de> wrote: > Doing > > MPI_Isend() > > followed by a > > MPI_Cancel() > > fails on my PSM based system with 1.8.4 like this: > > n040108:0.1.Cannot cancel send requests (req=0x2b6279787f80) > n040108:0.0.Cannot cancel send requests (req=0x2b3a3dc92f80) > ------------------------------------------------------- > Primary job terminated normally, but 1 process returned > a non-zero exit code.. Per user-direction, the job has been aborted. > ------------------------------------------------------- > -------------------------------------------------------------------------- > mpirun detected that one or more processes exited with non-zero status, > thus causing > the job to be terminated. The first process to do so was: > > Process name: [[58364,1],1] > Exit code: 255 > -------------------------------------------------------------------------- > > Is this something PSM actually cannot do or an Open MPI error? > > Adrian > _______________________________________________ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2015/01/16783.php >