labath added a comment.

hmm... I have a lot thoughts here..

- `setsid` is overkill. If you want to create process group, create a process 
group (`setpgid`), not a session.
- this solution does not seem very windows-friendly. In fact, I'd be surprised 
if it works there at all.
- going for `os.kill` will also not work with remote test suite runs. In fact, 
the way it is implemented now means we could start killing random processes on 
the host.
- are these really zombies (dead but not waited-for processes) or just live 
processes stuck in infinite loops and similar? Because if they are zombies 
(mmm... brainz), then I doubt sending signals to them will help -- they're 
already dead. The problem is that someone is not reaping them. I'm not sure 
what can be done about that, as normally init should take care of this...
- even if they are live processes, I am doubtful that this will make a 
difference. I don't think many of the processes we spawn create additional 
subprocesses. In fact, the only one I know of is `TestChangeProcessGroup.py`, 
which also creates a new process group... so this test would need special 
handling anyway.
- creating lots of process groups can actually cause us to leak _more_ 
processes in case the test run is interrupted with ^C or similar (SIGINT goes 
to the foreground process group)

Overall, I think we should try to understand the problem a bit better before we 
start shotgun debugging this. Which tests are creating these zombies? What's 
special about them? Do they always do that or only occasionally? etc.

PS: When we were running a mac bot (it seems only macs are vulnerable to zombie 
infestations), we configured it to reboot itself each midnight. Not the most 
elegant solution, but if the problem really is about init refusing to reap 
zombies, then it may be the only thing that we can do...



================
Comment at: lldb/packages/Python/lldbsuite/test/lldbtest.py:894
+            try:
+                os.kill(os.getpgid(p.pid), signal.SIGTERM)
+            except OSError:
----------------
Right this is pretty much equivalent to what we've had before. I guess you 
wanted to say `os.killpg(os.getpgid(pid), SIGTERM)`. 
`os.kill(-os.getpgid(pid))` might also work, but the first one seems more 
explicit.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D83815/new/

https://reviews.llvm.org/D83815



_______________________________________________
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits

Reply via email to