Ah - yet another set of options! :-)

Good suggestion, though...

On Apr 29, 2010, at 5:07 PM, Larry Baker wrote:

> Ralph,
> 
> I don't know if there is any standard ordering of non-zero exit status codes. 
>  If so, another option would be to return the the largest (smallest) value, 
> when that is the most serious exit status.
> 
> Larry Baker
> US Geological Survey
> 650-329-5608
> ba...@usgs.gov
> 
> On Apr 29, 2010, at 3:52 PM, Ralph Castain wrote:
> 
>> I ran into something this week that I think may require consideration by the 
>> MPI Forum. Specifically, Rolf found a problem in their MTT runs where the 
>> tests expect mpirun to return a non-zero exit status because one or more 
>> application processes did so, even though all application procs terminate 
>> normally.
>> 
>> I jury-rigged a simple algo that has mpirun return the exit status of the 
>> lowest rank that returned non-zero in the case where the job terminated 
>> normally. We still return the exit code of the first process to abnormally 
>> terminate (i.e., the process that is first reported to the HNP - may not be 
>> the first process that aborted).
>> 
>> However, it begs the question - what is the actual behavior supposed to be 
>> in the case where all procs terminate normally, but some may return 
>> (possibly different) non-zero codes?
>> 
>> I asked a few MPI users, and got a different answer from every one of them. 
>> Only consistent response I got was that the MPI standard doesn't say what 
>> should happen (can someone confirm that?).
>> 
>> Here is a sampling of the responses:
>> 
>> 1. return the exit status of the lowest rank that returned non-zero (which I 
>> implemented for now to silence Rolf's MTT problem)
>> 
>> 2. return the exit status of the highest rank that returned non-zero
>> 
>> 3. printout a histogram of exit statuses
>>  - ranks 0-9: 0
>>  - ranks 10-21,110: 1
>>  - ranks 22-35,40-51: 2
>>  ...
>> 
>> 4. printout ALL the exit statuses
>> 
>> 5. ignore it - mpirun's exit code should only reflect OMPI internals. It is 
>> the app developer's responsibility to properly deal with non-zero exit 
>> conditions (e.g., by calling MPI_Abort).
>> 
>> When I circled back around with these alternatives, I got the expected 
>> answer: everyone felt that all of them were good, and wanted a cmd line 
>> option to select the behavior for their job. They also noted that --xml 
>> should cause any of them to output in a defined xml format.
>> 
>> As I told Rolf, I honestly don't care what we do in this case. All I ask for 
>> is a clearly defined behavior so I don't get yanked in multiple directions, 
>> constantly circling around from one solution to the next.
>> 
>> So if the MPI standard doesn't specify this behavior, could someone involved 
>> in the Forum -please- get it to address this??
>> 
>> In the interim, what do -we- think it should do?
>> 
>> Thanks
>> Ralph
>> 
>> 
>> _______________________________________________
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 


Reply via email to