On Aug 2, 2008, at 11:46, Terry Dontje <terry.don...@sun.com> wrote:
Jeff Squyres wrote:
On Aug 1, 2008, at 11:39 PM, Brian Barrett wrote:
My thought is that if add_procs fails, then that BTL should be
removed (as if init failed) and things should continue on. If
that BTL was the only way to reach another process, we'll catch
that later and abort.
There are always going to be errors that can't be detected until
the device is actually used, so I think that add_procs errors
should be treated the same as init errors. The error_cb is a red
herring, as that's supposed to be used in situations where an
error can't directly be returned to the upper layers (like the
progress function). In this case, we can directly return an
error, so we should do so (and I believe we do, it's the BML/PML
that's the problem).
So if add_procs() fails, do you think that the BML/PML should
finalize the module? That looks like an easy change to make.
Second, if there are no other successfully-add_proc()'ed modules
from that component, should the BTL's progress function be removed
from the list of progress functions? The real question is: if a
module add_procs() fails, do we mandate that it still must be safe
to call the component's progress function? I think you're saying
"yes", but just wanted to be sure. I don't know offhand how a
component's progress function is added to the list (can't check
ATM), so I'd have to dig into that a bit.
I am curious how all of the above affects client/server or spawned
jobs. If you finalize a BTL then do a connect to a process that
would use that BTL would it reinitialize itself?
To deal with all the dynamics issues, I wouldn't finalized the BTL.
The BML should handle the progress stuff, just as if the add_procs
succeeded but returned no active peers. But I'd guess that's part of
the bit that doesn't work today. I would further suspect that a BTL
will need to have a working progress function in the face of
add_procs failures to cope with all the dynamics options. I'm
travelling this weekend, so I can't verify any of this at the moment.
Brian