Jeff Squyres (jsquyres) wrote: Nothing preventing it at all. The problem lies in what you do when you receive it. Take the example of a launch that used orted daemons. We could pass the "stop" or "continue" message to the orted, which could signal its child processes (i.e., the application processes on that node) with the appropriate signal. That would stop/continue the child process just fine - but what about communications that are still in-progress?? Bad news. So instead you could pass the application process a "stop" message. The process could then "quiet" the MPI-based messaging system, reply back to the orted that all is now quiet, and then the orted could send the appropriate OS-level signal so the process would truly "stop". "Continue" is much easier, of course - there is no "quieting" to be done, so the orted could just issue a "continue" signal to its children. Great - except we still haven't "stopped" the run-time! What happens if the registry is in the middle of a notification process (e.g., we hit a stage gate and all the notification messages are being sent, or someone is in the middle of a put that causes a set of subscriptions to fire and send out messages - that may in turn cause additional action on the remote host)? What about messages being routed through the orteds (once we get the routing system in-place)? Well, we now could go through a similar process to first "quiet" the run-time itself. We would have to ensure that every subsystem completed its on-going operation and then "stopped". We would of course have to tell all the remote processes to "stop" first so that new requests would quit coming in, or else this process would never complete. Note that this means the remote processes would have to receive and "log" any notifications that come in from the registry after we tell the process to "stop", but could not take action on those notices until we "continue" the process. So now we have the MPI and run-time layers "quiet". We send a message to the remote orteds indicating they should go ahead and send their local application processes an OS-level signal to "stop" so that the OS knows not to spend cycles on them. Unfortunately, we cannot do the same for the orteds themselves, so that means that the orteds remain "awake" and operating, but they can just "spin". All sounds fine. Now all we have to deal with are: all the race conditions inherent in what I just described; how to deal with receipt of asynchronous notifications when we've already been told to stop; the scenarios where we don't have orted daemons on every node; how to stop/restart major MPI collectives in mid operation; etc. etc. Not saying it cannot be done - just indicating that there were reasons why it wasn't initially done other than "we just didn't get around to it". :-)
|
- [OMPI devel] SIGSTOP and SIGCONT on orted Pak Lui
- Re: [OMPI devel] SIGSTOP and SIGCONT on orted Jeff Squyres (jsquyres)
- Re: [OMPI devel] SIGSTOP and SIGCONT on o... Ralph Castain
- Re: [OMPI devel] SIGSTOP and SIGCONT on orted Jeff Squyres (jsquyres)
- Re: [OMPI devel] SIGSTOP and SIGCONT on o... Ralph Castain
- Re: [OMPI devel] SIGSTOP and SIGCONT on orted Jeff Squyres (jsquyres)
- Re: [OMPI devel] SIGSTOP and SIGCONT on o... Ralph Castain
- Re: [OMPI devel] SIGSTOP and SIGCONT ... Josh Hursey
- Re: [OMPI devel] SIGSTOP and SIGCONT on orted Jeff Squyres (jsquyres)