Hi all,
I've noticed an odd (lack of) interaction between "maxconn" and "option
httpchk"...
If a server's maxconn limit has been reached, it appears that HTTP health
checks are still dispatched. If I've configured the maxconn limit to match the
number of requests the backend server can concurrently dispatch, and all these
connections are busy with slow requests, HAProxy will assume the server is
down; once the server completes a request, HAProxy waits until "rise" health
checks have succeeded (as expected if the server was really down, but it was
only busy). This makes overly busy times even worse.
I'm not sure if this explanation is clear; perhaps a concrete configuration
might help.
listen load_balancer
bind :80
mode http
balance leastconn
option httpchk HEAD /healthchk
http-check disable-on-404
default-server port 8080 inter 2s rise 2 fall 1 maxconn 3
server srv1 srv1.example.com:8080 check
server srv2 srv2.example.com:8080 check
With the above toy example, if each of srv1 and srv2 can only respond to 3
requests concurrently, and 6 slow requests come in (each taking more than 2
seconds), both backend servers will be considered down until up to 4 seconds in
the worst case (inter 2s * rise 2) after one of the requests finishes.
I know I can work around this by setting maxconn to one less than a server's
maximum capacity (perhaps this would be a good idea for other reasons). I
suspect I could work around this by using TCP status checks instead of HTTP
status checks, though I haven't tried this as I like the flexibility HTTP
health checks offer (like "disable-on-404").
Is this behavior a bug or a feature? Intuitively I would have expected the HTTP
health checks to respect maxconn limits, but perhaps there was a conscious
decision to not do so (for instance, maybe it was considered unacceptable for a
server's health to be unknown when it is fully loaded).
Thanks,
Bryan