Hi Willy, Cyril, Thank you for your detailed analysis. I still notice 504 errors almost immediately on a HAproxy start, and the PID matches the new process:
[root@frontend2 log]# ps aux | grep haproxy haproxy 21242 6.6 0.1 133176 47984 ? Rs 07:17 0:00 /usr/sbin/haproxy -D -f /etc/haproxy/haproxy.cfg -p /var/run/haproxy.pid [root@frontend2 log]# service haproxy stop; service rsyslog stop; rm -f /var/log/haproxy.log; service rsyslog start; service haproxy start; Stopping haproxy: [ OK ] Shutting down system logger: [ OK ] Starting system logger: [ OK ] Starting haproxy: [ OK ] [root@frontend2 log]# tail -f haproxy.log | grep 504 Sep 14 07:16:14 localhost haproxy[21178]: 94.197.40.185:3504[14/Sep/2011:07:16:08.216] main python_8001/python_8001_fe1 80/0/0/-1/6449 502 204 - - SH-- 3375/3375/950/950/0 0/0 "POST /xxx/chat/status/updates HTTP/1.1" Sep 14 07:16:15 localhost haproxy[21178]: 118.101.95.88:49504[14/Sep/2011:07:16:10.298] main python_9003/python_9003_fe1 22/0/0/-1/5088 502 204 - - SH-- 3312/3312/386/386/0 0/0 "POST /xxx/chat/message/3/updates HTTP/1.1" ^C [root@frontend2 log]# ps aux | grep haproxy haproxy 21178 5.4 0.2 137268 51480 ? Ss 07:16 0:01 /usr/sbin/haproxy -D -f /etc/haproxy/haproxy.cfg -p /var/run/haproxy.pid I don't understand how this is - can you shed any light? (the configuration is the same as in my email - i.e. server timeout 2hrs). I will investigate the 502 errors with more benchmarking directly against the backends. Thank you for confirming that this is the source of the problem. Many thanks, Alex On Tue, Sep 13, 2011 at 11:41 PM, Willy Tarreau <w...@1wt.eu> wrote: > Hi Alex, > > On Tue, Sep 13, 2011 at 03:18:54PM +0100, Alex Davies wrote: > > Hi, > > > > Thank you for your observation - indeed I did notice some of those as I > was > > writing my email - I have updated my globals to increase the server > timeout > > (as we are doing long polling) and reduce the others, and remove the > > duplicates: > > > > defaults > > mode http > > option httplog > > #option tcplog > > option dontlognull > > option dontlog-normal > > > > log global > > retries 10 > > maxconn 50000 > > option forwardfor except 127.0.0.1/32 # Apache on https://127.0.0.1 > > option httpclose # Required for REMOTE HEADER > > option redispatch > > > > timeout connect 10000 > > timeout client 10000 > > timeout server 7200000 > > > > I still notice the same errors in the logs! (slightly less 504, as I > would > > expect through the increase in "timeout server" - but I still don't > > understand why I get any at all in the first minute of a new process). > > To complete Cyril's detailed analysis, I'd like to add that you'll only > see 502s when you restart, and it will take some time before you see 504s > again (eg: 2 hours with the config above). > > The 502s mean that the server has suddenly aborted the connection (flags > SH), > while the 504s indicate that haproxy was fed up with waiting and closed > after > "timeout server" was elapsed. > > So yes it's very possible that your server has its own timeout, but it > should > be in the 30s from what I saw in your logs. It sill does not explain why > some > requests never time out on the server, maybe they don't wake the same > components > up ? > > Regards, > Willy > > -- Alex Davies This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error please notify the sender immediately by e-mail and delete this e-mail permanently.