Hi
HA-Proxy version 1.5-dev25-a339395 2014/05/10
I have read tons of posts addressing 408 wait timeouts and I have tried most
if not all of them. Currently about 5% of my traffic is getting this error
(~600,000 wait timeouts a day). The 408 errors that I see is that the
browser gets the timeout error immediately, doesn't seem to match the 12
seconds in the logs. I have seen this similar issue in other posts but I
still cannot solve.
**My load balancers are dedicated physical servers and are not close to full
utilization.**
I have tried the following in my config: 1. increase http-request to 12,60
and 90s 2. Change http-keep-alive timeouts (1s, 5s, 10s, and 12s) 3.
increased queue,server and client timouts to 32s each 4. increased connect
timeout to 12s 5. turned on accept-invalid-http-request, splice-auto, tcp-
smart-connect, tcp-smart-accept 6. turned off compression and 7. lowered mss
to 1422
Here are the errors in my log:
May 22 13:48:20 localhost.localdomain haproxy[19963]: <IP>:49219|
[22/May/2014:13:48:08.736]|non_ssl|non_ssl/<NOSRV>|-1/-1/-1/-1/+12000|408|+2
12|-|-|cR--|183/181/0/0/0|0/0|{||}||"<BADREQ>"|<IP>
May 22 13:48:20 localhost.localdomain haproxy[19963]: <IP>:49221|
[22/May/2014:13:48:08.746]|non_ssl|non_ssl/<NOSRV>|-1/-1/-1/-1/+12001|408|+2
12|-|-|cR--|182/180/0/0/0|0/0|{||}||"<BADREQ>"|<IP>
May 22 13:48:20 localhost.localdomain haproxy[19963]: <IP>:18792|
[22/May/2014:13:48:08.927]|non_ssl|non_ssl/<NOSRV>|-1/-1/-1/-1/+12000|408|+2
12|-|-|cR--|180/178/0/0/0|0/0|{||}||"<BADREQ>"|<IP>
May 22 13:48:20 localhost.localdomain haproxy[19963]: <IP>:18794|
[22/May/2014:13:48:08.927]|non_ssl|non_ssl/<NOSRV>|-1/-1/-1/-1/+12000|408|+2
12|-|-|cR--|179/177/0/0/0|0/0|{||}||"<BADREQ>"|<IP>
May 22 13:48:20 localhost.localdomain haproxy[19963]: <IP>:18795|
[22/May/2014:13:48:08.927]|non_ssl|non_ssl/<NOSRV>|-1/-1/-1/-1/+12000|408|+2
12|-|-|cR--|178/176/0/0/0|0/0|{||}||"<BADREQ>"|<IP>
May 22 13:48:20 localhost.localdomain haproxy[19963]: <IP>:18791|
[22/May/2014:13:48:08.929]|non_ssl|non_ssl/<NOSRV>|-1/-1/-1/-1/+12000|408|+2
12|-|-|cR--|176/174/0/0/0|0/0|{||}||"<BADREQ>"|<IP>
I have mixed and matched the above options but none have seemed to work.
Currently here is my config (without ACLs):
global
log 127.0.0.1 local2 ##Log to the local rsyslog daemon
user haproxy
group haproxy
pidfile /var/run/haproxy.pid
stats socket /tmp/haproxy.socket user nobody group nobody mode 600
level admin
node <NODENAME>
description HAPROXY2-DL
daemon
maxconn 120000
spread-checks 3
ca-base /etc/ssl/certs/comb
crt-base /etc/ssl/certs/comb
quiet
defaults
log global
mode http
option forwardfor
compression algo gzip
compression type text/html text/plain text/css text/xml
text/javascript
retries 5
timeout http-request 12s
timeout http-keep-alive 1s
timeout queue 32s
timeout connect 12s
timeout server 32s
timeout client 32s
option http-server-close
option accept-invalid-http-request
option splice-auto
option tcp-smart-connect
option tcp-smart-accept
log-format %ci:%cp|
[%t]|%ft|%b/%s|%Tq/%Tw/%Tc/%Tr/%Tt|%ST|%B|%CC|%CS|%tsc|%ac/%fc/%bc/%sc/%rc|%
sq/%bq|%hr|%hs|%{+Q}r|%fi
###PORT 80 LISTENER###
frontend non_ssl *:80 mss 1422
##Rate Limit, block ip for 10 minutes if true
stick-table type ip size 400k expire 10m store gpc0
acl whitelist src <ip1> <subnet1>
acl akamai_user_agent hdr_sub(User-Agent) -i <user-agent-cdn>
acl source_is_abuser src_get_gpc0(non_ssl) gt 0
use_backend ease-up-y0 if source_is_abuser ! whitelist
tcp-request connection track-sc1 src if ! source_is_abuser
acl network_allowed src <IP1> <subnet1>
acl restricted_page url_reg wp-admin
acl restricted_page url_reg wp-login.php
acl restricted_page url_reg cms/login-form.php
block if restricted_page !network_allowed
############################
###OPTIONS
maxconn 100000
mode http
option logasap
option forwardfor
reqadd X-Forwarded-Proto:\ http
I understand that some 408s are normal (possible DDOS) but I believe 5% is
too high.
So far we have only been able to reproduce the problem in Chrome but we
can't rule out just yet that it is isolated only to Chrome. Lastly, clients
have only recently started seeing these timeouts in their browser even
though the logs have shown the 408s for months.
Any help would truly be appreciated.
M