Hi Bryan, On Wed, Mar 23, 2011 at 09:27:01PM +0000, Cassidy, Bryan wrote: > Hi all, > > I've noticed an odd (lack of) interaction between "maxconn" and "option > httpchk"... > > If a server's maxconn limit has been reached, it appears that HTTP health > checks are still dispatched. If I've configured the maxconn limit to match > the number of requests the backend server can concurrently dispatch, and all > these connections are busy with slow requests, HAProxy will assume the server > is down; once the server completes a request, HAProxy waits until "rise" > health checks have succeeded (as expected if the server was really down, but > it was only busy). This makes overly busy times even worse.
Yes, that's a known situation. Minconn should always leave some room for health checks. When you have two haproxies, you might have to leave at least 2 connections for the health checks. In practice, 1 should be OK because they're supposed to be fast and it generally is not an issue if one waits a little bit to get a connection slot. This issue is sometimes encountered on mongrel servers where only one connection at a time is possible. The usual workaround for this case is to set a check timeout larger than what you consider a long request should be. Even if that can sound frustrating at first, you have to realize that if the server is failing to respond to health checks, there is no way to know whether it's too much busy or if it's dead. So there's nothing wrong with the current approach. If you pointed your browser to the server, you'd observe the same behaviour. If you think that you'd tell the difference because you'd wait longer, then it means you should adjust your check timeout. (...) > I know I can work around this by setting maxconn to one less than a server's > maximum capacity (perhaps this would be a good idea for other reasons). Yes that's the way to do it, and it will permit you to connect to the server without passing through haproxy. > I suspect I could work around this by using TCP status checks instead of HTTP > status checks, though I haven't tried this as I like the flexibility HTTP > health checks offer (like "disable-on-404"). You're right, but relying on TCP only will also not tell you when your servers are really dead if they're just frozen. > Is this behavior a bug or a feature? Intuitively I would have expected the > HTTP health checks to respect maxconn limits, but perhaps there was a > conscious decision to not do so (for instance, maybe it was considered > unacceptable for a server's health to be unknown when it is fully loaded). We have a task on the TODO list to make health checks pass through the queue and respect the maxconn too. This is especially important for mongrel. But still, doing so does not cover the situation where you have multiple LBs or when you need to check the server by yourself. Regards, Willy

