Hi Pierre, On Fri, Oct 21, 2016 at 03:05:55PM +0000, Pierre Cheynier wrote: > First let's clarify again: we are on systemd-based OS (centOS7), so reload is > done by sending SIGUSR2 to haproxy-systemd-wrapper. > Theoretically, this has absolutely no relation with our current issue (if I > understand well the way the old process are managed)
Yes it has something to do with it because it's the systemd-wrapper which delivers the signal to the old processes in this mode, while in the normal mode the processes get the signal directly from the new process. Another important point is that exactly *all* users having problem with zombie processes are systemd users, with no exception. And this problem has never existed over the first 15 years where systems were using a sane init instead and still do not exist on non-systemd OSes. > This happens on servers with live traffic, but with a reasonable amount of > connections. I'm also able to reproduce with no connections, but I've to be a > bit more aggressive with the reloads frequency (probably because children are > faster to die). OK that's interesting. And when this happens, they stay there forever ? > For me the problem is not that we still have connections or not, it is that > in this case some old processes are never "aware" that they should die, so > they continues to listen for incoming requests, thanks to SO_REUSEPORT. > Consequently, you end up with N process listening with different configs. Ah this is getting very interesting. Maybe we should hack systemd-wrapper to log the signals it receives and the signals and pids it sends to see what is happening here. It may also be that the signal is properly sent but never received (but why ?). > In the pstree I pasted in the previous message, there are 3 minutes between > the first living instance and the last (and as you can see, we are quite > aggressive with long connections) : > > timeout client 2s > timeout server 5s > timeout connect 200ms > timeout http-keep-alive 200ms > > Here is a Dockerfile that can be used to reproduce (where I use > haproxy-systemd-wrapper, just run with default settings - ie nb of > reloads=300 and interval between each=2ms -) : > > https://github.com/pierrecdn/haproxy-reload-issue > > docker build -t haproxy-reload-issue . && docker run --rm -ti > haproxy-reload-issue That's very kind, thank you. However I don't have access to a docker machine but I know some people on the list do so I hope we'll quickly find the cause and hopefully be able to fix it (unless it's another smart invention from systemd to further annoy running deamons). Another important point, when you say you restart every 2ms, are you certain you have a way to ensure that everything is completely started before you issue your signal to kill the old process ? I'm asking because thanks to the principle that the wrapper must stay in foreground (smart design choice from systemd), there's no way for a service manager to know whether all processes are fully started or not. With a normal init, when the process returns, all sub-processes have been created. So at 2ms I could easily imagine that we're delivering signals to a starting process, maybe even before it has the time to register a signal handler, and that these signals are lost before the sub-processes are started. Of course that's just a guess, but I don't see a clean way to work around this, except of course by switching back to a reliable service manager :-/ Regards, Willy