Ralph,
I don't know if there is any standard ordering of non-zero exit status
codes. If so, another option would be to return the the largest
(smallest) value, when that is the most serious exit status.
Larry Baker
US Geological Survey
650-329-5608
ba...@usgs.gov
On Apr 29, 2010, at 3:52 PM, Ralph Castain wrote:
I ran into something this week that I think may require
consideration by the MPI Forum. Specifically, Rolf found a problem
in their MTT runs where the tests expect mpirun to return a non-zero
exit status because one or more application processes did so, even
though all application procs terminate normally.
I jury-rigged a simple algo that has mpirun return the exit status
of the lowest rank that returned non-zero in the case where the job
terminated normally. We still return the exit code of the first
process to abnormally terminate (i.e., the process that is first
reported to the HNP - may not be the first process that aborted).
However, it begs the question - what is the actual behavior supposed
to be in the case where all procs terminate normally, but some may
return (possibly different) non-zero codes?
I asked a few MPI users, and got a different answer from every one
of them. Only consistent response I got was that the MPI standard
doesn't say what should happen (can someone confirm that?).
Here is a sampling of the responses:
1. return the exit status of the lowest rank that returned non-zero
(which I implemented for now to silence Rolf's MTT problem)
2. return the exit status of the highest rank that returned non-zero
3. printout a histogram of exit statuses
- ranks 0-9: 0
- ranks 10-21,110: 1
- ranks 22-35,40-51: 2
...
4. printout ALL the exit statuses
5. ignore it - mpirun's exit code should only reflect OMPI
internals. It is the app developer's responsibility to properly deal
with non-zero exit conditions (e.g., by calling MPI_Abort).
When I circled back around with these alternatives, I got the
expected answer: everyone felt that all of them were good, and
wanted a cmd line option to select the behavior for their job. They
also noted that --xml should cause any of them to output in a
defined xml format.
As I told Rolf, I honestly don't care what we do in this case. All I
ask for is a clearly defined behavior so I don't get yanked in
multiple directions, constantly circling around from one solution to
the next.
So if the MPI standard doesn't specify this behavior, could someone
involved in the Forum -please- get it to address this??
In the interim, what do -we- think it should do?
Thanks
Ralph
_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel