Jeff Squyres wrote:
On Aug 1, 2008, at 11:39 PM, Brian Barrett wrote:

My thought is that if add_procs fails, then that BTL should be removed (as if init failed) and things should continue on. If that BTL was the only way to reach another process, we'll catch that later and abort.

There are always going to be errors that can't be detected until the device is actually used, so I think that add_procs errors should be treated the same as init errors. The error_cb is a red herring, as that's supposed to be used in situations where an error can't directly be returned to the upper layers (like the progress function). In this case, we can directly return an error, so we should do so (and I believe we do, it's the BML/PML that's the problem).

So if add_procs() fails, do you think that the BML/PML should finalize the module? That looks like an easy change to make.

Second, if there are no other successfully-add_proc()'ed modules from that component, should the BTL's progress function be removed from the list of progress functions? The real question is: if a module add_procs() fails, do we mandate that it still must be safe to call the component's progress function? I think you're saying "yes", but just wanted to be sure. I don't know offhand how a component's progress function is added to the list (can't check ATM), so I'd have to dig into that a bit.

I am curious how all of the above affects client/server or spawned jobs. If you finalize a BTL then do a connect to a process that would use that BTL would it reinitialize itself?

--td

Reply via email to