On Apr 30, 2010, at 6:15 AM, Jeff Squyres wrote:

> On Apr 30, 2010, at 5:59 AM, N.M. Maclaren wrote:
> 
>> MPI quite rightly does not specify this, because the matter is very system-
>> dependent, and it is not possible to return the exit code (or display it)
>> in all environments.  Sorry, but that is reality.
> 
> Correct -- MPI intentionally does not say what happens after MPI_FINALIZE.  
> MPI intentionally doesn't even specify much about how to start an MPI job 
> (just like Fortran, actually).

Frankly, I disagree - I think the standard can and should say something 
explicit about this situation. It doesn't have to say how we implement it, but 
it should clearly explain to users what they should expect to see at the end of 
an MPI job.

Guess the real issue is: is the standard written for the general community, or 
solely for MPI implementers? If the latter, then saying nothing is fine. If the 
former, then it needs to clearly say something about this question.

> 
>> The last paragraph of the specification of MPI_Finalize makes it clear
>> that it is the USER'S responsibility to return an exit code to the system
>> for process 0, and that what happens for other ones is undefined.  Or
>> fairly clear - it could be stated in so many words, rather than being
>> implicit in the requirement on implementors.
> 
> I don't think that's quite feasible, because the user doesn't directly 
> control what mpirun returns.  So (many) implementations *have* to choose 
> something from their job start agent (mpirun or mpiexec or whatever).
> 
> I think OMPI's behavior of returning 0 from mpirun if and only if all 
> processes call MPI_FINALIZE successfully *and* return 0 is good.  Return 
> arbitrary nonzero if some process aborts (calling MPI_ABORT, not calling 
> MPI_INIT, not calling MPI_FINALIZE, or otherwise).  Return any of the 
> individual MPI processes' non-zero exit status if all call MPI_FINALIZE but 
> some (or all) don't return an exit status of 0 (I don't have a strong opinion 
> about which one to return -- e.g., the *first* one to return a non-zero exit 
> value, the *highest* or *lowest* non-zero exit status, ...etc.).

If that's the case, then I think the standard needs clearer language. My 
admittedly non-scientific poll indicates that users seem to think there is some 
expected behavior, and were surprised by the question.

So while the developer community may think it is okay as things stand, it was 
clear from my limited conversations that users all think something else is 
supposed to happen.

Just my $0.0002. As I said at the start of this thread, I don't care what 
solution we adopt for OMPI. 

However, I -do- insist that their be a formal specification of OMPI's behavior 
- not the current "whatever you want" approach. Otherwise, I will continue to 
be hit with these ad hoc requests that it behave the way someone thinks it 
should, with no recourse to some defined behavior accepted by this community.

> 
> -- 
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
> 
> 
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel


Reply via email to