I could see if less then N processes exit with non-zero exit code that
the ORTE may choose not to abort the job. However, if all N processes
have exited or aborted I expect everything to clean up and mpirun to
exit. It does not do that at the moment which I think is what is
causing most of the hangs in the MTT trunk runs which did not occur
prior to this week.
--td
On 4/13/2012 5:18 PM, Ralph Castain wrote:
This has come up again because some of the MTT tests depend on a specific
behavior when a process exits with a non-zero status - in this case, they
expect ORTE to abort the job. At some point, the default had been switched to
NOT abort the job if a process exited with a non-zero status.
So I'll throw this out to the community: if any process exits with a non-zero
status, should ORTE abort the job?
I don't personally care, but we ought to decide on something. In the meantime,
I will set the default so we DO abort, thus allowing the MTT runs to complete
correctly.
FWIW: the MCA param orte_abort_non_zero_exit can always be set to control this
behavior.
Ralph
_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel
--
Terry D. Dontje | Principal Software Engineer
Developer Tools Engineering | +1.781.442.2631
Oracle *- Performance Technologies*
95 Network Drive, Burlington, MA 01803
Email terry.don...@oracle.com <mailto:terry.don...@oracle.com>