Jeff Squyres wrote:
On Aug 1, 2008, at 11:39 PM, Brian Barrett wrote:
My thought is that if add_procs fails, then that BTL should be
removed (as if init failed) and things should continue on. If that
BTL was the only way to reach another process, we'll catch that later
and abort.
There are always going to be errors that can't be detected until the
device is actually used, so I think that add_procs errors should be
treated the same as init errors. The error_cb is a red herring, as
that's supposed to be used in situations where an error can't
directly be returned to the upper layers (like the progress
function). In this case, we can directly return an error, so we
should do so (and I believe we do, it's the BML/PML that's the problem).
So if add_procs() fails, do you think that the BML/PML should finalize
the module? That looks like an easy change to make.
Second, if there are no other successfully-add_proc()'ed modules from
that component, should the BTL's progress function be removed from the
list of progress functions? The real question is: if a module
add_procs() fails, do we mandate that it still must be safe to call
the component's progress function? I think you're saying "yes", but
just wanted to be sure. I don't know offhand how a component's
progress function is added to the list (can't check ATM), so I'd have
to dig into that a bit.
I am curious how all of the above affects client/server or spawned
jobs. If you finalize a BTL then do a connect to a process that would
use that BTL would it reinitialize itself?
--td