Hi, We run Haproxy on Amazon ec2 for http load balancing. On Monday (august 11) we upgraded seven of our load balancers in two of our products to 1.3.20 from 1.3.15.8 (four servers, all of one product) and 1.3.18 (three servers, all of the other product). We kept the config files the same. We finished replacing the load balancers by 2300 UTC on aug 11, and at about 0900 UTC Aug 12 the first cluster (the one upgraded from 1.3.15.8) started showing performance issues, enough to cause our monitoring systems to go off. Response times were several seconds. Logging on to one of the load balancers I saw normal cpu and memory, but looking at netstat -anp I saw more than 30k lines there, the majority in TIME_WAIT state. For background, the load balancers each point to the same pool of about 60 servers, which at the time were doing about 20-30 sessions per server, and the servers reporting about 80 requests per second (nominally 60% of peak). At this point we put the old load balancers back into production and found them to be still working fine. At around 1200 UTC Aug 12 a nearly identical state occured on the other set of load balancers (the ones upgraded from 1.3.18).
If anyone can see any issues please let me know. I have pasted a representative haproxy.cfg file below: # this config needs haproxy-1.1.28 or haproxy-1.2.1 global #log 127.0.0.1 local0 info #log 127.0.0.1 local1 notice #log loghost local0 info maxconn 75000 chroot /var/lib/haproxy user haproxy group haproxy daemon #debug #quiet defaults #log global mode http #option httplog option dontlognull option redispatch retries 3 maxconn 75000 contimeout 5000 clitimeout 50000 srvtimeout 2000 frontend openx *:80 #log global maxconn 75000 option forwardfor default_backend openx_ec2_hosted_http backend openx_ec2_hosted_http mode http #balance roundrobin balance leastconn option abortonclose option httpclose #remove the line below if not 1.3.20 #option httpchk HEAD /health.chk timeout queue 500 #option forceclose server crt.hosted.bigd04 10.252.102.128:80 check maxconn 150 weight 2 ... server crt.hosted.d03 10.252.203.175:80 check maxconn 50 ... server crt.hosted.d75 10.209.81.155:80 check maxconn 30 frontend openx_ssl *:443 #log global mode tcp maxconn 75000 option forwardfor default_backend openx_ec2_hosted_ssl backend openx_ec2_hosted_ssl mode tcp #balance roundrobin balance leastconn option abortonclose option httpclose #option forceclose server crt.hosted.bigd04-ssl 10.252.102.128:443 check maxconn 150 ... server crt.hosted.d03-ssl 10.252.203.175:443 check maxconn 30

