Just curious -- what's difficult about this?  SIGTSTP and SIGCONT can be
caught; is there something preventing us from sending "stop" and
"continue" messages (just like we send "die" messages)?
 
(If I had to guess, I think the user is asking because some other MPI
implementations implement this kind of behavior)
 
Thanks!


________________________________

        From: devel-boun...@open-mpi.org
[mailto:devel-boun...@open-mpi.org] On Behalf Of Ralph Castain
        Sent: Thursday, June 01, 2006 10:50 PM
        To: Open MPI Developers
        Subject: Re: [OMPI devel] SIGSTOP and SIGCONT on orted
        
        
        Actually, there were some implementation issues that might
prevent this from working and were the reason we didn't implement it
right away. We don't actually transmit the SIGTERM - we capture it in
mpirun and then propagate our own "die" command to the remote processes
and daemons. Fortunately, "die" is very easy to implement.
        
        Unfortunately, "stop" and "continue" are much harder to
implement from inside of a process. We'll have to look at it, but this
may not really be feasible.
        
        Ralph
        
        
        
        Jeff Squyres (jsquyres) wrote: 

                The main reason that it doesn't work is because we
didn't do any thing
                to make it work.  :-)
                
                Specifically, mpirun is not intercepting SIGSTOP and
passing it on to
                the remote nodes.  There is nothing in the design or
architecture that
                would prevent this, but we just don't do it [yet].
                 
                
                  

                        -----Original Message-----
                        From: devel-boun...@open-mpi.org 
                        [mailto:devel-boun...@open-mpi.org] On Behalf Of
Pak Lui
                        Sent: Thursday, June 01, 2006 5:02 PM
                        To: de...@open-mpi.org
                        Subject: [OMPI devel] SIGSTOP and SIGCONT on
orted
                        
                        Hi,
                        
                        I have a question on signals. Normally when I do
a SIGTERM 
                        (control-C) 
                        on mpirun, the signal seems to get handled in a
way that it 
                        broadcasts 
                        to the orted and processes on the execution
hosts. However, 
                        when I send 
                        a SIGSTOP to mpirun, mpirun seems to have
stopped, but the 
                        processes of 
                        the user executable continue to run. I guess I
could hook up the 
                        debugger to mpirun and orted to see why they are
handled differently, 
                        but I guess I anxious to hear about it here.
                        
                        I am trying to see the behavior of SIGSTOP and
SIGCONT for the 
                        suspension/resumption feature in N1GE. It'll try
to use these 
                        signals to 
                        stop and continue both mpirun and orted (and its
processes), but the 
                        signals (SIGSTOP and SIGCONT) don't seem to get
propagated to 
                        the remote 
                        orted.
                        
                        I can see there are some issues for implementing
this feature on N1GE 
                        because the 'qrsh' interface does not send the
signal to orted on the 
                        remote node, but only to 'mpirun'. I am trying
to see how to 
                        work around 
                        this.
                        
                        -- 
                        
                        Thanks,
                        
                        - Pak Lui
                        pak....@sun.com
                        
                        _______________________________________________
                        devel mailing list
                        de...@open-mpi.org
        
http://www.open-mpi.org/mailman/listinfo.cgi/devel
                        
                            

                
                _______________________________________________
                devel mailing list
                de...@open-mpi.org
                http://www.open-mpi.org/mailman/listinfo.cgi/devel
                
                  

Reply via email to