Hi Lukas,
> When reloading haproxy too fast on EL7 (RedHat, CentOS) the system is > being filled with orphaned processes. > > I encountered this problem on CentOS 7 with > haproxy-1.5.4-4.el7_1.x86_64 but expect it to exist on all systems > using haproxy-systemd-wrapper not just those based on Fedora. > > Steps to reproduce: > > 1) haproxy is running normal. > > [root@localhost ~]# ps ax | grep haproxy > 3140 ? Ss 0:00 /usr/sbin/haproxy-systemd-wrapper -f > /etc/haproxy/haproxy.cfg -p /run/haproxy.pid > 3141 ? S 0:00 /usr/sbin/haproxy -f > /etc/haproxy/haproxy.cfg -p /run/haproxy.pid -Ds > 3142 ? Ss 0:00 /usr/sbin/haproxy -f > /etc/haproxy/haproxy.cfg -p /run/haproxy.pid -Ds > > 2) Several reloads are executed in quick succession. Problem worsens > when processes happen to execute a reload in parallel. > > [root@localhost ~]# while :; do systemctl reload haproxy; done > ^C > > 3) There's multiple haproxy processes running that will never end. As > you can see there's duplicate pids for the -sf arg. Maybe caused by a > race between haproxy-systemd-wrapper reading and the new haproxy > process writing it's pid. > > [root@localhost ~]# ps ax | grep haproxy > 423 ? S 0:00 /usr/sbin/haproxy -f > /etc/haproxy/haproxy.cfg -p /run/haproxy.pid -Ds -sf 419 > 429 ? S 0:00 /usr/sbin/haproxy -f > /etc/haproxy/haproxy.cfg -p /run/haproxy.pid -Ds -sf 425 > 430 ? Ss 0:00 /usr/sbin/haproxy -f > /etc/haproxy/haproxy.cfg -p /run/haproxy.pid -Ds -sf 419 > 431 ? Ss 0:00 /usr/sbin/haproxy -f > /etc/haproxy/haproxy.cfg -p /run/haproxy.pid -Ds -sf 425 > 31833 ? Ss 0:01 /usr/sbin/haproxy-systemd-wrapper -f > /etc/haproxy/haproxy.cfg -p /run/haproxy.pid > 36593 ? S 0:00 /usr/sbin/haproxy -f > /etc/haproxy/haproxy.cfg -p /run/haproxy.pid -Ds -sf 36587 > 36600 ? Ss 0:00 /usr/sbin/haproxy -f > /etc/haproxy/haproxy.cfg -p /run/haproxy.pid -Ds -sf 36587 > 38316 ? S 0:00 /usr/sbin/haproxy -f > /etc/haproxy/haproxy.cfg -p /run/haproxy.pid -Ds -sf 38311 > 38324 ? Ss 0:00 /usr/sbin/haproxy -f > /etc/haproxy/haproxy.cfg -p /run/haproxy.pid -Ds -sf 38311 > 38344 ? S 0:00 /usr/sbin/haproxy -f > /etc/haproxy/haproxy.cfg -p /run/haproxy.pid -Ds -sf 38325 > 38350 ? Ss 0:00 /usr/sbin/haproxy -f > /etc/haproxy/haproxy.cfg -p /run/haproxy.pid -Ds -sf 38325 > ... > ... > > > I believe the problem is that there's a race in > haproxy-systemd-wrapper.c line 98 where it's missing a > } else if (nb_pid> 0) { ... block until nb_pid is no longer found in > pidfile. Or something similarly blocking. > > Otherwise the parent will accept new SIGUSR2/SIGHUP reloads before the > new haproxy process that was spawned in line 96 has written it's pid > file. > > Also note the following from the systemd.service manpage: > "It is strongly recommended to set ExecReload= to a command that not > only triggers a configuration reload of the daemon, but also > synchronously waits for it to complete." > That's currently not the case. Thanks for the analysis, make sense to me. Also, since locking in the parent scripts [1] fixes the issue if I understand correctly, it further confirms your suspicion. CC'ing systemd contributors for comments. Regards, Lukas [1] https://github.com/mesosphere/marathon-lb/commit/83260fdf687c774064b54d3bb009f5b3a1d75c97