> On March 20, 2014, 4:12 p.m., Ben Mahler wrote: > > src/launcher/executor.cpp, lines 305-307 > > <https://reviews.apache.org/r/19239/diff/2/?file=520053#file520053line305> > > > > "off" init? > > > > We may want to mention that this could accidentally kill new processes, > > if the pids rotate around, right?
Thanks - will fix. Reg. pid reuse: that shouldn't happen - we traverse the process once more with killtree and we reap on the root (so that should not be possible to have been reused). We are only risking not killing orphan processes (as far as I think off). Does that make sense? > On March 20, 2014, 4:12 p.m., Ben Mahler wrote: > > src/launcher/executor.cpp, line 68 > > <https://reviews.apache.org/r/19239/diff/2/?file=520053#file520053line68> > > > > Can you pull out a constant instead of using Seconds(3) and move the > > comment related to the shutdown grace period to be beside the constant? > > > > We can put this in the slave constants for now. I'm a bit curious about > > the strategy here because as soon as we provide a kill escalation, people > > will want to tune the escalation time, possibly per framework or per > > executor. Sure - I was actually playing around with using the slave constants but somehow felt that it crossed the border between the slave and executor too much. I'll get it in. > On March 20, 2014, 4:12 p.m., Ben Mahler wrote: > > src/launcher/executor.cpp, line 223 > > <https://reviews.apache.org/r/19239/diff/2/?file=520053#file520053line223> > > > > How about we make the signaling explicit in the logging? > > > > E.g. > > > > "Sending SIGTERM to process tree..." > > > > And below: > > > > "Process <pid> did not terminate after <escalation_timeout>, sending > > SIGKILL to process tree at <pid>" Great point. Will get that in too. - Niklas ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/19239/#review38010 ----------------------------------------------------------- On March 14, 2014, 3:16 p.m., Niklas Nielsen wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/19239/ > ----------------------------------------------------------- > > (Updated March 14, 2014, 3:16 p.m.) > > > Review request for mesos, Ben Mahler and Ian Downes. > > > Bugs: MESOS-1031 > https://issues.apache.org/jira/browse/MESOS-1031 > > > Repository: mesos-git > > > Description > ------- > > This patch adds a two step signal escalation: 1) SIGTERM is sent to > the process tree 2) If the process tree root hasn't been reaped within > escalationTimeout (3 seconds), SIGKILL is sent to the process tree. > Currently, this may leave behind orphan processes if parent processes > terminates and leave non-responsive processes behind. This will be > captured when PID namespaces are in place, as the orphan processes > will end up hanging off the process tree root instead of init. > > > Diffs > ----- > > src/launcher/executor.cpp e30d77a > > Diff: https://reviews.apache.org/r/19239/diff/ > > > Testing > ------- > > make check > > > Thanks, > > Niklas Nielsen > >
