Le 6 sept. 07 à 09:27, Terry D. Dontje a écrit :

Gleb Natapov wrote:

On Thu, Sep 06, 2007 at 06:50:43AM -0600, Ralph H Castain wrote:


WHAT:   Decide upon how to handle MPI applications where one or more
       processes exit without calling MPI_Finalize

WHY:    Some applications can abort via an exit call instead of
       calling MPI_Abort when a library (or something else) calls
       exit. This situation is outside a user's control, so they
       cannot fix it.

WHERE:  Refer to ticket #1144 - code changes are TBD

WHEN:   Up to the group



[snip]


Does the general community feel we should do anything here, or is this a "bug" that should be fixed by the entity calling "exit"? I should note that it actually is bad behavior (IMHO) for any library to call "exit" - but then, we do that in some situations too, so perhaps we shouldn't cast
stones!

Any suggested solutions or comments on whether or not we should do anything
would be appreciated.



IMO (a) should be implemented.



I don't think (b) should be implemented. However, one could register an atexit handler that calls MPI_finalize. Therefore, the exiting process
would be stuck until everyone else reaches their exits or finalize.

That being said I think (a) probably makes more sense and adheres to the
MPI standard.

I agree (b) is not a good idea. However I am not very pleased by (a) either. It totally prevent any process Fault Tolerant mechanism if we go that way. If we plan to add some failure detection mechanism to RTE and failure management (to avoid Finalize to hang), we should add the ability to plug-in FT specific error handlers. The default error handler should do exactly what is proposed by Ralph, but nowhere else (than in this handler) the RTE code should assume that the application is aborting when a failure occurs. If it is a FT application it might just not abort and recover.

Aurelien


--td

_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


Reply via email to