If this change the behavior of MPI_Abort to only abort processes on the 
specified communicator how this doesn't affects the default user experience 
(when today it aborts everything)?

If we accept the fact that MPI_Abort will only abort the processes in the 
current communicator what happens with the other processes in the same 
MPI_COMM_WORLD (but not on the communicator that has been used by MPI_Abort)? 
What about all the other connected processes (based on the connectivity as 
defined in the MPI standard in Section 10.5.4) ? Do they see this as a fault? 

  george.

On Jun 9, 2011, at 16:32 , Josh Hursey wrote:

> WHAT: Fix missing code in MPI_Abort
> 
> WHY: MPI_Abort is missing logic to ask for termination of the process
> group defined by the communicator
> 
> WHERE: Mostly orte/mca/errmgr
> 
> WHEN: Open MPI trunk
> 
> TIMEOUT: Tuesday, June 14, 2011 (after teleconf)
> 
> Details:
> -------------------------------------------
> A bitbucket branch is available here (last sync to r24757 of trunk)
> https://bitbucket.org/jjhursey/ompi-abort/
> 
> In the MPI Standard (v2.2) Section 8.7 after the introduction of
> MPI_Abort, it states:
> "This routine makes a best attempt to abort all tasks in the group of comm."
> 
> Open MPI currently only calls orte_errmgr.abort() to abort the calling
> process itself. The code to ask for the abort of the other processes
> in the group defined by the communicator is commented out. Since one
> process calling abort currently causes all processes in the job to
> abort, it has not been a big deal. However as the group starts
> exploring better resilience in the OMPI layer (with further support
> from the ORTE layer) this aspect of MPI_Abort will become more
> necessary to get right.
> 
> This branch adds back the logic necessary for a single process calling
> MPI_Abort to request, from ORTE errmgr, that a defined subgroup of
> processes be aborted. Once the request is sent to the HNP, the local
> process then calls abort on itself. The HNP requests that the defined
> subgroup of processes be terminated using the existing plm mechanisms
> for doing so.
> 
> This change has no effect on the current default user experienced
> behavior of MPI_Abort.
> 
> -- 
> Joshua Hursey
> Postdoctoral Research Associate
> Oak Ridge National Laboratory
> http://users.nccs.gov/~jjhursey
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel


Reply via email to