Le 6 sept. 07 à 09:27, Terry D. Dontje a écrit :
Gleb Natapov wrote:
On Thu, Sep 06, 2007 at 06:50:43AM -0600, Ralph H Castain wrote:
WHAT: Decide upon how to handle MPI applications where one or more
processes exit without calling MPI_Finalize
WHY: Some applications can abort via an exit call instead of
calling MPI_Abort when a library (or something else) calls
exit. This situation is outside a user's control, so they
cannot fix it.
WHERE: Refer to ticket #1144 - code changes are TBD
WHEN: Up to the group
[snip]
Does the general community feel we should do anything here, or is
this a
"bug" that should be fixed by the entity calling "exit"? I should
note that
it actually is bad behavior (IMHO) for any library to call "exit"
- but
then, we do that in some situations too, so perhaps we shouldn't
cast
stones!
Any suggested solutions or comments on whether or not we should
do anything
would be appreciated.
IMO (a) should be implemented.
I don't think (b) should be implemented. However, one could
register an
atexit handler that calls MPI_finalize. Therefore, the exiting
process
would be stuck until everyone else reaches their exits or finalize.
That being said I think (a) probably makes more sense and adheres
to the
MPI standard.
I agree (b) is not a good idea. However I am not very pleased by (a)
either. It totally prevent any process Fault Tolerant mechanism if we
go that way. If we plan to add some failure detection mechanism to
RTE and failure management (to avoid Finalize to hang), we should add
the ability to plug-in FT specific error handlers. The default error
handler should do exactly what is proposed by Ralph, but nowhere else
(than in this handler) the RTE code should assume that the
application is aborting when a failure occurs. If it is a FT
application it might just not abort and recover.
Aurelien
--td
_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel