Hi, we've been running HAProxy 1.4 in production for about 15 months now
and have loved nearly every minute of it.

In some basic testing against our production setup (which has quietly
served north of 200mbps of traffic and has been tuned through some large
layer 7 DDOS attacks) I ran the following from a Mac on a not-so-fast
connection:

ab -r -n 3000 -c 20 <url>

On the Mac some of the connections get dropped because of a bug in the
version of Apache Bench I'm using and show things like "Send request
failed!" (possibly relevant, read on).

After running this I noticed there seem to be ~25 "stuck" sessions to the
backend server that received these requests (i.e. `scur` in the CSV
output). This is a low traffic time and this level of concurrent sessions
is not common for our application. Also strange is that the Tomcat node
does not show any existence of these connections.

Here's a simplification of our configuration:

global
maxconn 50000

defaults
mode http
retries 3
 maxconn 2000
 timeout connect 8s
 timeout client 50s
timeout server 30s
option abortonclose
 option contstats

frontend myfrontend
bind <ip>:80
 mode http
 timeout client 50s
 timeout http-request 5s
maxconn 70000
option http-server-close
 option httplog
 # Lots of ACLs to choose from a couple backends and/or filter
  default_backend backend1

backend backend1
 balance hdr(host)
hash-type consistent
option forwardfor
 timeout queue 6s
server server5 <ip:port> cookie server5 check inter 2s fall 2 rise 8
slowstart 48s weight 20 maxconn 40

I ran `netstat -an` on the box running HAProxy and found exactly 24 entries
like the following (different public port for each one, obviously).

tcp        0      0 <haproxy-ip>:47643            <server5-ip:port>
    CLOSE_WAIT

Running netstat on "server5" doesn't reveal anything unusual.


I'm wondering if we have something incorrectly configured, or if I was able
to terminate a bunch of open sessions and leave the balancing box in limbo?
We try to protect our webservers with pretty low maxconn's on each backend
server, but whenever we have a slow down on our backend it can cause a
ripple effect of problems. I'm starting to wonder now if this is caused by
a large number of users cancelling page requests and then "clogging" the
backend servers' individual queues.

We're running HA-Proxy version 1.4.21 2012/05/21. I do realize some of the
timeouts are set a bit high but I've waited a good 60 minutes and the 24
ghost sessions are still showing, so I don't chalk it up to that.

Thanks for your time,

Scott Hulbert

Reply via email to