On Sep 3, 2010, at 8:14 AM, Jeff Squyres wrote:

> On Sep 1, 2010, at 4:47 PM, Steve Wise wrote:
> 
>> I was wondering what the logic is behind allowing an MPI job to continue in 
>> the presence of a fatal qp error?
> 
> It's a feature...?

The idea was that in some near future we will be able to recover from such kind 
of error. (reopen qp, etc...)
But the feature has never been implemented for ompi. 
(BTW, not sure that it is tree anymore, since SUN/ORACLE pushed some code, that 
supposed to handle such cases...)

So, maybe it worth to handle it like device fatal case - abort everything.

Pasha


Reply via email to