I'm upgrading my old 1.4.18 haproxies to 1.5.4 and I have a mysterious problem where haproxy marks some backend servers as being DOWN with a message "L4TOUT in 2000ms". Some times the message also has a star: "* L4TOUT in 2000ms" (I didn't find what the star means from the docs). Also the reported timeout varies between 2000ms and 2003ms.
This does not happen to every backend and it doesn't happen immediately. After restart every backend is green and a few backends starts to get marked DOWN after about 30 minutes or so. I'm also running two instances in two different servers and they both suffer the same problem but the DOWN servers aren't same. So server A might be marked DOWN on haproxy-1 and server B marked down on haproxy-2 (or vice versa). This seems to happen regardless how much traffic I run into the haproxies. I can always ssh into the haproxies and run curl against the check url and it always works, so this problem seems to be inside haproxy. My haproxy config is a kind of long so I copied it here: http://koti.kapsi.fi/garo/nobackup/haproxy.cfg (I've sanitised it a bit, but only hostnames). I've ran the logging with verbose debugging to check if that gives any clues on the health check issue, but the logs did not reveal anything to my eye. I can however gather a new log sample on the health checks, but the haproxies are now receiving production traffic so the log amount would be too much to gather at the current moment. I've also gathered some tcpdump traffic to the hosts marked DOWN and strangely it seems that the hosts is receiving queries. It could be that one (or more) processes (I'm using nbprocs 7 on my 8 core aws c3.2xlarge instance) haven't marked the host down. Trying to refresh the stats uri doesn't seem to indicate this, but it's hard to be sure as the probability of going thru all seven different processes fast enough is low. All clues and debugging ideas are greatly appreciated.

