Hello all

The question was raised at today's developer workshop about our current
practice of putting the application processes in a separate process group
from their parent ORTE daemon. This has the unfortunate side effect of
making the processes "invisible" to any host resource manager when they are
launched via mpirun - i.e., the RM launches the orted's, but never sees the
local application procs that the orted fork/exec's. Since those processes
are then moved into a separate process group, the host RM has no way of
killing them should the orted fail and the procs not suicide.

The request was made that we modify the orted so it no longer changes the
application proc's process group. This will leave the orted and the
application procs in the same process group, and so any signals delivered
by the host RM to the orted will be received by all processes.

However, in reviewing the code, I (re)discovered why this was originally
done. The issue stems from when Sun joined the OMPI project - their MPI
implementation allowed the user to pause their job by hitting mpirun with a
SIGTSTP, and then start again by hitting mpirun with a SIGCNT. These
signals needed to be seen not just by the initial child processes started
by the orted, but also by any subsequent child processes those processes
might have started.

It is this latter point that led to the process group change. Since the
"grandchild" processes were not started by the orted, the orted itself has
no knowledge of their pid. Thus, the orted cannot send the SIGSTP to the
individual target pid's. However, if the orted hits the "leader of the
process group that contains its children", then that signal would also hit
the orted - thus causing the orted to "pause". There would be no way for
mpirun to "wake up" the orted after that point so it could subsequently
"unpause" the application.

Hence the decision was made to move the application procs into their own
process group. The orted can then signal the process group, thus ensuring
that all procs (grandchildren etc.) receive the signal - without disabling
the orted itself.

If we want to retain this pause/restart behavior, then I see no way to
change the current method of putting the application procs into their own
process group. So I guess this issue becomes a choice:

* either we disable pause/restart by signal
* someone comes up with an alternative way of "pausing" the processes,
including any descendants, without disturbing the orted...or devise a
scheme for waking the orted up after it has been "paused". PMIx didn't
exist back then, but perhaps we might be able to use it to help us here
(e.g., a PMIx API to tell it to hit our orteds with a SIGCNT)?

Suggestions?
Ralph

Reply via email to