Hi,
I have a question on signals. Normally when I do a SIGTERM (control-C)
on mpirun, the signal seems to get handled in a way that it broadcasts
to the orted and processes on the execution hosts. However, when I send
a SIGSTOP to mpirun, mpirun seems to have stopped, but the processes of
the user executable continue to run. I guess I could hook up the
debugger to mpirun and orted to see why they are handled differently,
but I guess I anxious to hear about it here.
I am trying to see the behavior of SIGSTOP and SIGCONT for the
suspension/resumption feature in N1GE. It'll try to use these signals to
stop and continue both mpirun and orted (and its processes), but the
signals (SIGSTOP and SIGCONT) don't seem to get propagated to the remote
orted.
I can see there are some issues for implementing this feature on N1GE
because the 'qrsh' interface does not send the signal to orted on the
remote node, but only to 'mpirun'. I am trying to see how to work around
this.
--
Thanks,
- Pak Lui
pak....@sun.com