Actually, there were some implementation issues that might prevent this
from working and were the reason we didn't implement it right away. We
don't actually transmit the SIGTERM - we capture it in mpirun and then
propagate our own "die" command to the remote processes and daemons.
Fortunately, "die" is very easy to implement. Unfortunately, "stop" and "continue" are much harder to implement from inside of a process. We'll have to look at it, but this may not really be feasible. Ralph Jeff Squyres (jsquyres) wrote: The main reason that it doesn't work is because we didn't do any thing to make it work. :-)Specifically, mpirun is not intercepting SIGSTOP and passing it on to the remote nodes. There is nothing in the design or architecture that would prevent this, but we just don't do it [yet].-----Original Message----- From: devel-boun...@open-mpi.org [mailto:devel-boun...@open-mpi.org] On Behalf Of Pak Lui Sent: Thursday, June 01, 2006 5:02 PM To: de...@open-mpi.org Subject: [OMPI devel] SIGSTOP and SIGCONT on orted Hi, I have a question on signals. Normally when I do a SIGTERM (control-C) on mpirun, the signal seems to get handled in a way that it broadcasts to the orted and processes on the execution hosts. However, when I send a SIGSTOP to mpirun, mpirun seems to have stopped, but the processes of the user executable continue to run. I guess I could hook up the debugger to mpirun and orted to see why they are handled differently, but I guess I anxious to hear about it here. I am trying to see the behavior of SIGSTOP and SIGCONT for the suspension/resumption feature in N1GE. It'll try to use these signals to stop and continue both mpirun and orted (and its processes), but the signals (SIGSTOP and SIGCONT) don't seem to get propagated to the remote orted. I can see there are some issues for implementing this feature on N1GE because the 'qrsh' interface does not send the signal to orted on the remote node, but only to 'mpirun'. I am trying to see how to work around this. -- Thanks, - Pak Lui pak....@sun.com _______________________________________________ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel_______________________________________________ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel |
- [OMPI devel] SIGSTOP and SIGCONT on orted Pak Lui
- Re: [OMPI devel] SIGSTOP and SIGCONT on orted Jeff Squyres (jsquyres)
- Re: [OMPI devel] SIGSTOP and SIGCONT on o... Ralph Castain
- Re: [OMPI devel] SIGSTOP and SIGCONT on orted Jeff Squyres (jsquyres)
- Re: [OMPI devel] SIGSTOP and SIGCONT on o... Ralph Castain
- Re: [OMPI devel] SIGSTOP and SIGCONT on orted Jeff Squyres (jsquyres)
- Re: [OMPI devel] SIGSTOP and SIGCONT on o... Ralph Castain
- Re: [OMPI devel] SIGSTOP and SIGCONT ... Josh Hursey
- Re: [OMPI devel] SIGSTOP and SIGCONT on orted Jeff Squyres (jsquyres)