Hi list,

I experiment the following behaviour : I'm on 1.6.8 (same behaviour in 
1.4/1.5), use systemd and noticed that when reloads are relatively frequent, 
old processes sometimes never dies and stays bound to the TCP socket(s), thanks 
to SO_REUSEPORT.

Here is an example of process tree: 
root     24115  0.0  0.0  46340  1824 ?        Ss   14:34   0:00 
/usr/sbin/haproxy-systemd-wrapper -f /etc/haproxy/haproxy.cfg -p 
/run/haproxy.pid 
haproxy  27403  0.2  0.0  89272 20096 ?        S    14:49   0:00  \_ 
/usr/sbin/haproxy -f /etc/haproxy/haproxy.cfg -p /run/haproxy.pid -Ds -sf 27366 
haproxy  27450  1.2  0.0  89272 14380 ?        Rs   14:49   0:00  |   \_ 
/usr/sbin/haproxy -f /etc/haproxy/haproxy.cfg -p /run/haproxy.pid -Ds -sf 27366 
haproxy  27410  0.2  0.0  89272 16008 ?        S    14:49   0:00  \_ 
/usr/sbin/haproxy -f /etc/haproxy/haproxy.cfg -p /run/haproxy.pid -Ds -sf 27366 
haproxy  27458  1.2  0.0  89272 14392 ?        Ss   14:49   0:00  |   \_ 
/usr/sbin/haproxy -f /etc/haproxy/haproxy.cfg -p /run/haproxy.pid -Ds -sf 27366 
haproxy  27626  0.3  0.0  89272 16008 ?        S    14:49   0:00  \_ 
/usr/sbin/haproxy -f /etc/haproxy/haproxy.cfg -p /run/haproxy.pid -Ds -sf 27623 
haproxy  27674  1.1  0.0  89272 14380 ?        Ss   14:49   0:00  |   \_ 
/usr/sbin/haproxy -f /etc/haproxy/haproxy.cfg -p /run/haproxy.pid -Ds -sf 27623 
haproxy  27722  0.2  0.0  89272 16008 ?        S    14:49   0:00  \_ 
/usr/sbin/haproxy -f /etc/haproxy/haproxy.cfg -p /run/haproxy.pid -Ds -sf 27716 
haproxy  27762  1.0  0.0  89272 14368 ?        Ss   14:49   0:00  |   \_ 
/usr/sbin/haproxy -f /etc/haproxy/haproxy.cfg -p /run/haproxy.pid -Ds -sf 27716

The problem is easily repoducible: just loop over reload (systemctl / SIGUSR2), 
50 times without sleep for example.

It happens when two reloads are performed in a small amount of time. 
As a result, there is no 'back-reference' in the '-sf' of one haproxy instance 
to the previous one, and it becomes "disconnected" from the others (see 27450 
in my example which seems totally alone).
This is also visible in journalctl output (generally 2 haproxy instances has 
the  same PID reference in '-sf', resulting in one lost, see 27366 in my 
example).

I had a look at haproxy-systemd-wrapper.c and guessed that the PID file is only 
read and never written here.
To me it seems that a race condition happens and that several instances do not 
reference the previous one, maybe because the PID can be written after X 
reloads has been done.

Restarting the server is very impacting and, to me, this is why there was 
approaches like the one used at Yelp 
(https://engineeringblog.yelp.com/2015/04/true-zero-downtime-haproxy-reloads.html)
 consisting in letting the client do SYN-retries or buffering the SYNs  while 
doing a full restart.

This becomes impossible in PaaS-like approach where many events occurs and may 
trigger reloads every seconds. BTW, the new "no-reuseport" feature does not 
help in my case (as well as ip/nftables or tc workarounds) because it 
introduces latencies spikes potentially every second.

Maybe you've some insights to share before digging into that ?

Thanks, 

-Pierre 

Reply via email to