Hi Lukas,


> When reloading haproxy too fast on EL7 (RedHat, CentOS) the system is
> being filled with orphaned processes.
>
> I encountered this problem on CentOS 7 with
> haproxy-1.5.4-4.el7_1.x86_64 but expect it to exist on all systems
> using haproxy-systemd-wrapper not just those based on Fedora.
>
> Steps to reproduce:
>
> 1) haproxy is running normal.
>
> [root@localhost ~]# ps ax | grep haproxy
> 3140 ? Ss 0:00 /usr/sbin/haproxy-systemd-wrapper -f
> /etc/haproxy/haproxy.cfg -p /run/haproxy.pid
> 3141 ? S 0:00 /usr/sbin/haproxy -f
> /etc/haproxy/haproxy.cfg -p /run/haproxy.pid -Ds
> 3142 ? Ss 0:00 /usr/sbin/haproxy -f
> /etc/haproxy/haproxy.cfg -p /run/haproxy.pid -Ds
>
> 2) Several reloads are executed in quick succession. Problem worsens
> when processes happen to execute a reload in parallel.
>
> [root@localhost ~]# while :; do systemctl reload haproxy; done
> ^C
>
> 3) There's multiple haproxy processes running that will never end. As
> you can see there's duplicate pids for the -sf arg. Maybe caused by a
> race between haproxy-systemd-wrapper reading and the new haproxy
> process writing it's pid.
>
> [root@localhost ~]# ps ax | grep haproxy
> 423 ? S 0:00 /usr/sbin/haproxy -f
> /etc/haproxy/haproxy.cfg -p /run/haproxy.pid -Ds -sf 419
> 429 ? S 0:00 /usr/sbin/haproxy -f
> /etc/haproxy/haproxy.cfg -p /run/haproxy.pid -Ds -sf 425
> 430 ? Ss 0:00 /usr/sbin/haproxy -f
> /etc/haproxy/haproxy.cfg -p /run/haproxy.pid -Ds -sf 419
> 431 ? Ss 0:00 /usr/sbin/haproxy -f
> /etc/haproxy/haproxy.cfg -p /run/haproxy.pid -Ds -sf 425
> 31833 ? Ss 0:01 /usr/sbin/haproxy-systemd-wrapper -f
> /etc/haproxy/haproxy.cfg -p /run/haproxy.pid
> 36593 ? S 0:00 /usr/sbin/haproxy -f
> /etc/haproxy/haproxy.cfg -p /run/haproxy.pid -Ds -sf 36587
> 36600 ? Ss 0:00 /usr/sbin/haproxy -f
> /etc/haproxy/haproxy.cfg -p /run/haproxy.pid -Ds -sf 36587
> 38316 ? S 0:00 /usr/sbin/haproxy -f
> /etc/haproxy/haproxy.cfg -p /run/haproxy.pid -Ds -sf 38311
> 38324 ? Ss 0:00 /usr/sbin/haproxy -f
> /etc/haproxy/haproxy.cfg -p /run/haproxy.pid -Ds -sf 38311
> 38344 ? S 0:00 /usr/sbin/haproxy -f
> /etc/haproxy/haproxy.cfg -p /run/haproxy.pid -Ds -sf 38325
> 38350 ? Ss 0:00 /usr/sbin/haproxy -f
> /etc/haproxy/haproxy.cfg -p /run/haproxy.pid -Ds -sf 38325
> ...
> ...
>
>
> I believe the problem is that there's a race in
> haproxy-systemd-wrapper.c line 98 where it's missing a
> } else if (nb_pid> 0) { ... block until nb_pid is no longer found in
> pidfile. Or something similarly blocking.
>
> Otherwise the parent will accept new SIGUSR2/SIGHUP reloads before the
> new haproxy process that was spawned in line 96 has written it's pid
> file.
>
> Also note the following from the systemd.service manpage:
> "It is strongly recommended to set ExecReload= to a command that not
> only triggers a configuration reload of the daemon, but also
> synchronously waits for it to complete."
> That's currently not the case.

Thanks for the analysis, make sense to me. Also, since locking
in the parent scripts [1] fixes the issue if I understand correctly,
it further confirms your suspicion.

CC'ing systemd contributors for comments.



Regards,

Lukas


[1] 
https://github.com/mesosphere/marathon-lb/commit/83260fdf687c774064b54d3bb009f5b3a1d75c97

                                          

Reply via email to