Re: [OMPI devel] if btl->add_procs() fails...?

Terry Dontje Sat, 2 Aug 2008 12:46:26 -0400

Jeff Squyres wrote:

On Aug 1, 2008, at 11:39 PM, Brian Barrett wrote:
My thought is that if add_procs fails, then that BTL should beremoved (as if init failed) and things should continue on. If thatBTL was the only way to reach another process, we'll catch that laterand abort.
There are always going to be errors that can't be detected until thedevice is actually used, so I think that add_procs errors should betreated the same as init errors. The error_cb is a red herring, asthat's supposed to be used in situations where an error can'tdirectly be returned to the upper layers (like the progressfunction). In this case, we can directly return an error, so weshould do so (and I believe we do, it's the BML/PML that's the problem).
So if add_procs() fails, do you think that the BML/PML should finalizethe module? That looks like an easy change to make.
Second, if there are no other successfully-add_proc()'ed modules fromthat component, should the BTL's progress function be removed from thelist of progress functions? The real question is: if a moduleadd_procs() fails, do we mandate that it still must be safe to callthe component's progress function? I think you're saying "yes", butjust wanted to be sure. I don't know offhand how a component'sprogress function is added to the list (can't check ATM), so I'd haveto dig into that a bit.

I am curious how all of the above affects client/server or spawnedjobs. If you finalize a BTL then do a connect to a process that woulduse that BTL would it reinitialize itself?


--td

Re: [OMPI devel] if btl->add_procs() fails...?

Reply via email to