Hello all The question was raised at today's developer workshop about our current practice of putting the application processes in a separate process group from their parent ORTE daemon. This has the unfortunate side effect of making the processes "invisible" to any host resource manager when they are launched via mpirun - i.e., the RM launches the orted's, but never sees the local application procs that the orted fork/exec's. Since those processes are then moved into a separate process group, the host RM has no way of killing them should the orted fail and the procs not suicide.
The request was made that we modify the orted so it no longer changes the application proc's process group. This will leave the orted and the application procs in the same process group, and so any signals delivered by the host RM to the orted will be received by all processes. However, in reviewing the code, I (re)discovered why this was originally done. The issue stems from when Sun joined the OMPI project - their MPI implementation allowed the user to pause their job by hitting mpirun with a SIGTSTP, and then start again by hitting mpirun with a SIGCNT. These signals needed to be seen not just by the initial child processes started by the orted, but also by any subsequent child processes those processes might have started. It is this latter point that led to the process group change. Since the "grandchild" processes were not started by the orted, the orted itself has no knowledge of their pid. Thus, the orted cannot send the SIGSTP to the individual target pid's. However, if the orted hits the "leader of the process group that contains its children", then that signal would also hit the orted - thus causing the orted to "pause". There would be no way for mpirun to "wake up" the orted after that point so it could subsequently "unpause" the application. Hence the decision was made to move the application procs into their own process group. The orted can then signal the process group, thus ensuring that all procs (grandchildren etc.) receive the signal - without disabling the orted itself. If we want to retain this pause/restart behavior, then I see no way to change the current method of putting the application procs into their own process group. So I guess this issue becomes a choice: * either we disable pause/restart by signal * someone comes up with an alternative way of "pausing" the processes, including any descendants, without disturbing the orted...or devise a scheme for waking the orted up after it has been "paused". PMIx didn't exist back then, but perhaps we might be able to use it to help us here (e.g., a PMIx API to tell it to hit our orteds with a SIGCNT)? Suggestions? Ralph