|
Well, I actually don't know much about opal_event_loop and/or how it is
intended to work. My guess is that: (a) your remote orted is acting as the seed and your local process (the one in Eclipse) is running as a client to that seed - at least, that was the case last I talked to Nathan (b) when the seed orted dies, it is the oob in your local client that actually detects socket closure and decides that - since it is the seed that has lost contact - the local application must abort. (c) the errmgr.abort function does exactly what it was supposed to do - it provides an immediate way of killing the local process. I'd be a little hesitant to recommend overloading the errmgr.abort function as you really do want the local processes to die when losing connection to the seed (at least, until we develop a recovery capability for the seed orted - which is some ways off), and (given the way you are running) I'm not sure you can have a different errmgr for your process while leaving the other one for everyone else. Probably the best solution for now would be for us to insert a (yet another) MCA parameter into the errmgr that would (if set) have errmgr.abort do something other than exit. The question then is: what would you want it to do?? We need to have it tell the rest of the system to stop trying to send messages etc - right now, I don't think the infrastructure exists to do that short of killing orte. We could try to have errmgr.abort do an orte_finalize - that would kill the orte system without impacting your host program, I suspect. You would then have to re-initialize, so we'd have to find some way to let you know that we had finalized. I can't swear this will work, though - we might well generate a segfault since this is happening deep down inside the system. We could try it, though. Would any of that be of help? Do you have any suggestions on how we might let you know that we had finalized? Ralph Brian Barrett wrote: On Apr 19, 2006, at 4:15 PM, Greg Watson wrote: |
- [OMPI devel] opal_event_loop exiting Greg Watson
- Re: [OMPI devel] opal_event_loop exiting Greg Watson
- Re: [OMPI devel] opal_event_loop exiting Brian Barrett
- Re: [OMPI devel] opal_event_loop exiting Ralph Castain
- Re: [OMPI devel] opal_event_loop exiting Greg Watson
- Re: [OMPI devel] opal_event_loop exiting Ralph Castain
- Re: [OMPI devel] opal_event_loop exiti... Greg Watson
- Re: [OMPI devel] opal_event_loop e... Ralph Castain
