Ah - yet another set of options! :-) Good suggestion, though...
On Apr 29, 2010, at 5:07 PM, Larry Baker wrote: > Ralph, > > I don't know if there is any standard ordering of non-zero exit status codes. > If so, another option would be to return the the largest (smallest) value, > when that is the most serious exit status. > > Larry Baker > US Geological Survey > 650-329-5608 > ba...@usgs.gov > > On Apr 29, 2010, at 3:52 PM, Ralph Castain wrote: > >> I ran into something this week that I think may require consideration by the >> MPI Forum. Specifically, Rolf found a problem in their MTT runs where the >> tests expect mpirun to return a non-zero exit status because one or more >> application processes did so, even though all application procs terminate >> normally. >> >> I jury-rigged a simple algo that has mpirun return the exit status of the >> lowest rank that returned non-zero in the case where the job terminated >> normally. We still return the exit code of the first process to abnormally >> terminate (i.e., the process that is first reported to the HNP - may not be >> the first process that aborted). >> >> However, it begs the question - what is the actual behavior supposed to be >> in the case where all procs terminate normally, but some may return >> (possibly different) non-zero codes? >> >> I asked a few MPI users, and got a different answer from every one of them. >> Only consistent response I got was that the MPI standard doesn't say what >> should happen (can someone confirm that?). >> >> Here is a sampling of the responses: >> >> 1. return the exit status of the lowest rank that returned non-zero (which I >> implemented for now to silence Rolf's MTT problem) >> >> 2. return the exit status of the highest rank that returned non-zero >> >> 3. printout a histogram of exit statuses >> - ranks 0-9: 0 >> - ranks 10-21,110: 1 >> - ranks 22-35,40-51: 2 >> ... >> >> 4. printout ALL the exit statuses >> >> 5. ignore it - mpirun's exit code should only reflect OMPI internals. It is >> the app developer's responsibility to properly deal with non-zero exit >> conditions (e.g., by calling MPI_Abort). >> >> When I circled back around with these alternatives, I got the expected >> answer: everyone felt that all of them were good, and wanted a cmd line >> option to select the behavior for their job. They also noted that --xml >> should cause any of them to output in a defined xml format. >> >> As I told Rolf, I honestly don't care what we do in this case. All I ask for >> is a clearly defined behavior so I don't get yanked in multiple directions, >> constantly circling around from one solution to the next. >> >> So if the MPI standard doesn't specify this behavior, could someone involved >> in the Forum -please- get it to address this?? >> >> In the interim, what do -we- think it should do? >> >> Thanks >> Ralph >> >> >> _______________________________________________ >> devel mailing list >> de...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/devel >