I don't have any issue with this so long as (a) it is -only- active when someone sets a specific MCA param requesting it, and (b) that flag is -not- set by default.
On Jan 4, 2010, at 11:50 AM, Iain Bason wrote: > WHAT: Enhance the orte_forward_job_control MCA flag by: > > 1. Forwarding signals to descendants of launched processes; and > 2. Forwarding signals received before process launch time. > > (The orte_forward_job_control flag arranges for SIGTSTP and SIGCONT to > be forwarded. This allows a resource manager like Sun Grid Engine to > suspend a job by sending a SIGTSTP signal to mpirun.) > > WHY: Some programs do "mpirun prog.sh", and prog.sh starts multiple > processes. Among these programs is weather prediction code from > the UK Met Office. This code is used at multiple sites around > the world. Since other MPI implementations* forward job control > signals this way, we risk having OMPI excluded unless we > implement this feature. > > [*I have personally verified that Intel MPI does it. I have > heard that Scali does it. I don't know about the others.] > > HOW: To allow signals to be sent to descendants of launched processes, > use the setpgrp() system call to create a new process group for > each launched process. Then send the signal to the process group > rather than to the process. > > To allow signals received before process launch time to be > delivered when the processes are launched, add a job state flag > to indicate whether the job is suspended. Check this flag at > launch time, and send a signal immediately after launching. > > WHERE: http://bitbucket.org/igb/ompi-job-control/ > > WHEN: We would like to integrate this into the 1.5 branch. > > TIMEOUT: COB Tuesday, January 19, 2010. > > Q&A: > > 1. Will this work for Windows? > > I don't know what would be required to make this work for > Windows. The current implementation is for Unix only. > > 2. Will this work for interactive ssh/rsh PLM? > > It will not work any better or worse than the current > implementation. One can suspend a job by typing Ctl-Z at a > terminal, but the mpirun process itself never gets suspended. > That means that in order to wake the job up one has to open a > different terminal to send a SIGCONT to the mpirun process. It > would be desirable to fix this problem, but as this feature is > intended for use with resource managers like SGE it isn't > essential to make it work smoothly in an interactive shell. > > 3. Will the creation of new process groups prohibit SGE from killing > a job properly? > > No. SGE has a mechanism to ensure that all a job's processes are > killed, regardless of whether they create new process groups. > > 4. What about other resource managers? > > Using this flag with another resource manager might cause > problems. However, the flag may not be necessary with other > resource managers. (If the RM can send SIGSTOP to all the > processes on all the nodes running a job, then mpirun doesn't > need to forward job control signals.) > > According to the SLURM documentation, plugins are available > (e.g., linuxproc) that would allow reliable termination of all a > job's processes, regardless of whether they create new process > groups. > [https://computing.llnl.gov/linux/slurm/proctrack_plugins.html] > > 5. Will the creation of new process groups prevent mpirun from > shutting down the job successfully (e.g., when it receives a > SIGTERM)? > > No. I have tested jobs both with and without calls to > MPI_Comm_Spawn, and all are properly terminated. > > 6. Can we avoid creating new process groups by just signaling the > launched process plus any process that calls MPI_Init? > > No. The shell script might launch other background processes > that the user wants to suspend. (The Met Office code does this.) > > 7. Can we avoid creating new process groups by having mpirun and > orted send SIGTSTP to their own process groups, and ignore the > signal that they send to themselves? > > No. First, mpirun might be in the same process group as other > mpirun processes. Those mpiruns could get into an infinite loop > forwarding SIGTSTPs to one another. Second, although the default > action on receipt of SIGTSTP is to suspend the process, that only > happens if the process is not in an orphaned process group. SGE > starts processes in orphaned process groups. > > _______________________________________________ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel