On 21 October 2016 at 15:33, Willy Tarreau <[email protected]> wrote:

>
> On Fri, Oct 21, 2016 at 03:05:55PM +0000, Pierre Cheynier wrote:
> > First let's clarify again: we are on systemd-based OS (centOS7), so
> reload is
> > done by sending SIGUSR2 to haproxy-systemd-wrapper.
> > Theoretically, this has absolutely no relation with our current issue
> (if I
> > understand well the way the old process are managed)
>
> Yes it has something to do with it because it's the systemd-wrapper which
> delivers the signal to the old processes in this mode, while in the normal
> mode the processes get the signal directly from the new process. Another
> important point is that exactly *all* users having problem with zombie
> processes are systemd users, with no exception. And this problem has never
> existed over the first 15 years where systems were using a sane init
> instead and still do not exist on non-systemd OSes.
>
> > This happens on servers with live traffic, but with a reasonable amount
> of
> > connections. I'm also able to reproduce with no connections, but I've to
> be a
> > bit more aggressive with the reloads frequency (probably because
> children are
> > faster to die).
>
> OK that's interesting. And when this happens, they stay there forever ?
>
> > For me the problem is not that we still have connections or not, it is
> that
> > in this case some old processes are never "aware" that they should die,
> so
> > they continues to listen for incoming requests, thanks to SO_REUSEPORT.
> > Consequently, you end up with N process listening with different configs.
>
> Ah this is getting very interesting. Maybe we should hack systemd-wrapper
> to log the signals it receives and the signals and pids it sends to see
> what is happening here. It may also be that the signal is properly sent
> but never received (but why ?).


There was a similar issue with reloads in Docker that I reported a while
back: https://www.mail-archive.com/[email protected]/msg21485.html . It
was ultimately tracked down to a faulty Golang compiler version, which
messed up signal masks of spawned processes. This is the direction I'd look
in; given all the hackery systemd engages in and the wanton disregard it
shows for everything that wasn't specifically written to live in the brave
new systemd world, I wouldn't put it past the wrapper to do something nasty
to signals there. The good outcome here is that it's a bug and gets fixed
eventually and then works its way to distros. The bad outcome is that it's
intentional, and systemd maintainers tell anyone whose code got broken to
go away, as they tend to.

Cheers,
Maciej

Reply via email to