Re: [OMPI devel] Forwarding SIGTSTP and SIGCONT

Jeff Squyres Thu, 11 Dec 2008 14:48:08 -0500

On Dec 8, 2008, at 11:11 AM, Ralph Castain wrote:

It sounds reasonable to me. I agree with Ralf W about having mpirunsend a STOP to itself - that would seem to solve the problem aboutstopping everything.
It would seem, however, that you cannot similarly STOP the daemonsor else you won't be able to CONT the job. I'm not sure how big adeal that is - I can't think of any issue it creates offhand.
Is there any issue in the MPI comm layers if you abruptly STOP aprocess while it's communicating? Especially since the STOP is goingto be asynchronous. Do you need to quiet networks like IB first?

It might be better to allow the MPI procs to do "something" beforeactually stopping. This might prevent timeout-sensitive stuff fromfailing (although I don't know if Josh's CR code even handles thesekinds of things...?). The obvious case that I can think of is if theMPI process is stopped in the middle of an openib CM action. None ofthe openib CPC's can currently handle a timeout nicely.


--
Jeff Squyres
Cisco Systems

Re: [OMPI devel] Forwarding SIGTSTP and SIGCONT

Reply via email to