Hi, Leo Famulari <[email protected]> skribis:
> On Sun, May 14, 2017 at 11:36:17PM +0200, Ludovic Courtès wrote: >> What does /var/log/shepherd.log show around the time where you hit >> “halt”? >> >> I get something like this: >> >> --8<---------------cut here---------------start------------->8--- >> 18:06:26 Service mcron has been stopped. >> 18:06:26 sending all processes the TERM signal > > For me, this is where it gets stuck: > > ------ > 2017-05-16 19:12:53 sending all processes the TERM signal > 2017-05-16 19:12:58 waiting for process termination (processes left: (1 494)) > 2017-05-16 19:13:00 waiting for process termination (processes left: (1 494)) > 2017-05-16 19:13:02 waiting for process termination (processes left: (1 494)) > ------ > > In my experience, it will wait here forever. > > And from `ps aux`: > > leo 494 0.0 0.1 27232 3676 ? Ss 19:12 0:00 tmux The bug was 100% reproducible in a VM, and AFAICS it is fixed by 7f090203d5fb033eb1b64778b03afad5bb35f5f2. The problem was that the tmux server process would be left as a zombie, and then the loop would always see it because the parent process of the tmux server process is PID 1 and for some reason the PID 1 either didn’t get SIGCHLD or the handler didn’t run. The test that this commit adds does exactly the same thing: launch tmux and then invoke “halt”. I tried to create a synthetic test not involving tmux, simply creating a process that gets PID 1 as its parent, but it wouldn’t trigger the bug. I’m unclear as to why tmux triggers it and no that other simple test. Thanks, Ludo’.
