I think Ralph's point is that OMPI is providing the run-time environment for 
the application, and it would probably behoove us to support both kinds of 
behaviors since there are obviously people in both camps out there.  

It's pretty easy to add a non-default MCA param / orterun CLI option to say 
"abort the job if any of them exit with a non-zero status."


On Apr 14, 2011, at 9:43 AM, Ken Lloyd wrote:

> Point well made, Nick. In other words, irrespective of OS or language, are we 
> citing the need for "application correcting code" from OpenMPI, (relocate a/o 
> retry) similar to ECC in memory? 
> 
> Ken
> 
> On Thu, 2011-04-14 at 14:31 +0100, N.M. Maclaren wrote:
>> On Apr 14 2011, Ralph Castain wrote:
>> >> 
>> >>> ...  It's hopeless, and whatever you do will be wrong for many
>> >>> people.  ...
>> >> 
>> >> I think that sums it up pretty well.  :-)
>> >> 
>> >> It does seem a little strange that the scenario you describe somewhat 
>> >> implies that one process is calling MPI_Finalize loooong before the 
>> >> others do. Specifically, the user is concerned with tying up resources 
>> >> after one process has called Finalize -- which implies that the others 
>> >> may continue on for a while. It's not invalid, of course, but it is a 
>> >> little unusual.
>> >
>> > I'm finding it more common than we thought. Note that I didn't say that 
>> > one process called MPI_Finalize before the others. In this case, they 
>> > call it fairly close together, but the individual processes continue 
>> > running for quite some time, or until they determine that something is 
>> > wrong and exit with non-zero status.
>> 
>> Nobody is denying that it is common.  Now, what happens when you encounter
>> a language or compiler that uses return codes for mere warnings (e.g.
>> ignored IEEE 754 flags, as stated to be desirable by LIA-1)?  Bang!
>> 
>> Remember that C is not the universe and many languages use MPI via the
>> C interface, but do not let C control their model.
>> 
>> Regards,
>> Nick Maclaren.
>> 
>> _______________________________________________
>> devel mailing list
>> 
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 
> =====================
> Kenneth A. Lloyd
> CEO - Director of Systems Science
> Watt Systems Technologies Inc.
> www.wattsys.com
> kenneth.ll...@wattsys.com 
> 
> This e-mail is covered by the Electronic Communications Privacy Act, 18 
> U.S.C. 2510-2521 and is intended only for the addressee named above. It may 
> contain privileged or confidential information. If you are not the addressee 
> you must not copy, distribute, disclose or use any of the information in it. 
> If you have received it in error please delete it and immediately notify the 
> sender.
> 
> 
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/


Reply via email to