I don't have any issue with this so long as (a) it is -only- active when 
someone sets a specific MCA param requesting it, and (b) that flag is -not- set 
by default.


On Jan 4, 2010, at 11:50 AM, Iain Bason wrote:

> WHAT: Enhance the orte_forward_job_control MCA flag by:
> 
>  1. Forwarding signals to descendants of launched processes; and
>  2. Forwarding signals received before process launch time.
> 
> (The orte_forward_job_control flag arranges for SIGTSTP and SIGCONT to
> be forwarded.  This allows a resource manager like Sun Grid Engine to
> suspend a job by sending a SIGTSTP signal to mpirun.)
> 
> WHY: Some programs do "mpirun prog.sh", and prog.sh starts multiple
>     processes.  Among these programs is weather prediction code from
>     the UK Met Office.  This code is used at multiple sites around
>     the world.  Since other MPI implementations* forward job control
>     signals this way, we risk having OMPI excluded unless we
>     implement this feature.
> 
>     [*I have personally verified that Intel MPI does it.  I have
>     heard that Scali does it.  I don't know about the others.]
> 
> HOW: To allow signals to be sent to descendants of launched processes,
>     use the setpgrp() system call to create a new process group for
>     each launched process.  Then send the signal to the process group
>     rather than to the process.
> 
>     To allow signals received before process launch time to be
>     delivered when the processes are launched, add a job state flag
>     to indicate whether the job is suspended.  Check this flag at
>     launch time, and send a signal immediately after launching.
> 
> WHERE: http://bitbucket.org/igb/ompi-job-control/
> 
> WHEN: We would like to integrate this into the 1.5 branch.
> 
> TIMEOUT: COB Tuesday, January 19, 2010.
> 
> Q&A:
> 
>  1. Will this work for Windows?
> 
>     I don't know what would be required to make this work for
>     Windows.  The current implementation is for Unix only.
> 
>  2. Will this work for interactive ssh/rsh PLM?
> 
>     It will not work any better or worse than the current
>     implementation.  One can suspend a job by typing Ctl-Z at a
>     terminal, but the mpirun process itself never gets suspended.
>     That means that in order to wake the job up one has to open a
>     different terminal to send a SIGCONT to the mpirun process.  It
>     would be desirable to fix this problem, but as this feature is
>     intended for use with resource managers like SGE it isn't
>     essential to make it work smoothly in an interactive shell.
> 
>  3. Will the creation of new process groups prohibit SGE from killing
>     a job properly?
> 
>     No.  SGE has a mechanism to ensure that all a job's processes are
>     killed, regardless of whether they create new process groups.
> 
>  4. What about other resource managers?
> 
>     Using this flag with another resource manager might cause
>     problems.  However, the flag may not be necessary with other
>     resource managers.  (If the RM can send SIGSTOP to all the
>     processes on all the nodes running a job, then mpirun doesn't
>     need to forward job control signals.)
> 
>     According to the SLURM documentation, plugins are available
>     (e.g., linuxproc) that would allow reliable termination of all a
>     job's processes, regardless of whether they create new process
>     groups.
>     [https://computing.llnl.gov/linux/slurm/proctrack_plugins.html]
> 
>  5. Will the creation of new process groups prevent mpirun from
>     shutting down the job successfully (e.g., when it receives a
>     SIGTERM)?
> 
>     No.  I have tested jobs both with and without calls to
>     MPI_Comm_Spawn, and all are properly terminated.
> 
>  6. Can we avoid creating new process groups by just signaling the
>     launched process plus any process that calls MPI_Init?
> 
>     No.  The shell script might launch other background processes
>     that the user wants to suspend.  (The Met Office code does this.)
> 
>  7. Can we avoid creating new process groups by having mpirun and
>     orted send SIGTSTP to their own process groups, and ignore the
>     signal that they send to themselves?
> 
>     No.  First, mpirun might be in the same process group as other
>     mpirun processes.  Those mpiruns could get into an infinite loop
>     forwarding SIGTSTPs to one another.  Second, although the default
>     action on receipt of SIGTSTP is to suspend the process, that only
>     happens if the process is not in an orphaned process group.  SGE
>     starts processes in orphaned process groups.
> 
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel


Reply via email to