Performance problems with 1.3.20

James Hartshorn Wed, 12 Aug 2009 12:50:26 -0700

Hi,

We run Haproxy on Amazon ec2 for http load balancing.  On Monday
(august 11) we upgraded seven of our load balancers in two of our
products to 1.3.20 from 1.3.15.8 (four servers, all of one product)
and 1.3.18 (three servers, all of the other product).  We kept the
config files the same.  We finished replacing the load balancers by
2300 UTC on aug 11, and at about 0900 UTC Aug 12 the first cluster
(the one upgraded from 1.3.15.8) started showing performance issues,
enough to cause our monitoring systems to go off.  Response times were
several seconds.  Logging on to one of the load balancers I saw normal
cpu and memory, but looking at netstat -anp I saw more than 30k lines
there, the majority in TIME_WAIT state.  For background, the load
balancers each point to the same pool of about 60 servers, which at
the time were doing about 20-30 sessions per server, and the servers
reporting about 80 requests per second (nominally 60% of peak).  At
this point we put the old load balancers back into production and
found them to be still working fine.  At around 1200 UTC Aug 12 a
nearly identical state occured on the other set of load balancers (the
ones upgraded from 1.3.18).


If anyone can see any issues please let me know.

I have pasted a representative haproxy.cfg file below:

# this config needs haproxy-1.1.28 or haproxy-1.2.1

global
#log 127.0.0.1 local0 info
#log 127.0.0.1 local1 notice
#log loghost local0 info
maxconn 75000
chroot /var/lib/haproxy
user haproxy
group haproxy
daemon
#debug
#quiet

defaults
#log global
mode http
#option httplog
option dontlognull
    option  redispatch
retries 3
maxconn 75000
contimeout 5000
clitimeout 50000
srvtimeout 2000


frontend openx *:80
#log global
maxconn 75000
       option forwardfor
       default_backend openx_ec2_hosted_http

backend openx_ec2_hosted_http
       mode http
       #balance roundrobin
       balance leastconn
       option abortonclose
       option httpclose
       #remove the line below if not 1.3.20
       #option httpchk HEAD /health.chk
       timeout queue 500
       #option forceclose

       server crt.hosted.bigd04 10.252.102.128:80 check maxconn 150 weight 2
...
       server crt.hosted.d03 10.252.203.175:80 check maxconn 50
...
      server crt.hosted.d75 10.209.81.155:80 check maxconn 30



frontend openx_ssl *:443
       #log    global
       mode tcp
       maxconn 75000
       option forwardfor
       default_backend openx_ec2_hosted_ssl

backend openx_ec2_hosted_ssl
       mode tcp
       #balance roundrobin
       balance leastconn
       option abortonclose
       option httpclose
       #option forceclose


       server crt.hosted.bigd04-ssl 10.252.102.128:443 check maxconn 150
...
       server crt.hosted.d03-ssl 10.252.203.175:443 check maxconn 30

Performance problems with 1.3.20

Reply via email to