Hi, Sorry, wrong order in the answers.
> Yes it has something to do with it because it's the systemd-wrapper which > delivers the signal to the old processes in this mode, while in the normal > mode the processes get the signal directly from the new process. Another > important point is that exactly *all* users having problem with zombie > processes are systemd users, with no exception. And this problem has never > existed over the first 15 years where systems were using a sane init > instead and still do not exist on non-systemd OSes. Unfortunately, I remember we had the same issue (but less frequently) on CentOS6 which is init-based. I tried to reproduce, but didn't succeed... So let's ignore that for now, it was maybe related to something else. > OK that's interesting. And when this happens, they stay there forever ? Yes, these process are never stopped and are still bound to the socket. > Ah this is getting very interesting. Maybe we should hack systemd-wrapper > to log the signals it receives and the signals and pids it sends to see > what is happening here. It may also be that the signal is properly sent > but never received (but why ?). Clearly. Apparently I sometimes have a wrong information in the pidfile... Have a look at journald logs: Oct 24 12:26:57 haproxys01e02-par haproxy-systemd-wrapper[44319]: haproxy-systemd-wrapper: executing /usr/sbin/haproxy -f /etc/haproxy/haproxy.cfg -p /run/haproxy.pid -Ds -sf 44941 Oct 24 12:26:57 haproxys01e02-par haproxy-systemd-wrapper[44319]: [WARNING] 297/122657 (44951) : config : 'option forwardfor' ignored for frontend 'https-in' as it requires HTTP mode. Oct 24 12:27:00 haproxys01e02-par haproxy-systemd-wrapper[44319]: haproxy-systemd-wrapper: executing /usr/sbin/haproxy -f /etc/haproxy/haproxy.cfg -p /run/haproxy.pid -Ds -sf 44952 Oct 24 12:27:00 haproxys01e02-par haproxy-systemd-wrapper[44319]: [WARNING] 297/122700 (44978) : config : 'option forwardfor' ignored for frontend 'https-in' as it requires HTTP mode. Oct 24 12:27:05 haproxys01e02-par haproxy-systemd-wrapper[44319]: haproxy-systemd-wrapper: executing /usr/sbin/haproxy -f /etc/haproxy/haproxy.cfg -p /run/haproxy.pid -Ds -sf 44983 Oct 24 12:27:05 haproxys01e02-par haproxy-systemd-wrapper[44319]: [WARNING] 297/122705 (45131) : config : 'option forwardfor' ignored for frontend 'https-in' as it requires HTTP mode. Oct 24 12:27:09 haproxys01e02-par haproxy-systemd-wrapper[44319]: haproxy-systemd-wrapper: executing /usr/sbin/haproxy -f /etc/haproxy/haproxy.cfg -p /run/haproxy.pid -Ds -sf 45132 Oct 24 12:27:09 haproxys01e02-par haproxy-systemd-wrapper[44319]: [WARNING] 297/122709 (45146) : config : 'option forwardfor' ignored for frontend 'https-in' as it requires HTTP mode. Hopefully I've an error in my config, which let me see the process of the first child :). Here we can the that: * 44978 references (-sf) 44952 (child of 44951) * 45131 references 44983=nobody that we've seen in the logs... (so 44978 and its child will stay alive forever !) * 45146 references 45132 (child of 45131) > That's very kind, thank you. However I don't have access to a docker > machine but I know some people on the list do so I hope we'll quickly > find the cause and hopefully be able to fix it (unless it's another > smart invention from systemd to further annoy running deamons). > Another important point, when you say you restart every 2ms, are you > certain you have a way to ensure that everything is completely started > before you issue your signal to kill the old process ? > (..) > So at 2ms I could easily imagine that we're delivering signals to a > starting process, maybe even before it has the time to register a signal > handler, and that these signals are lost before the sub-processes are > started. Clearly no, my test is trivial, but as I observe the behaviour on a platform that operates at a different time scale (reload every 1 to 10 seconds average), it was just a way to reproduce the issue and be able to investigate in the container for ex. with gdb. > Regards, > Willy Thanks ! Pierre

