Hi,

On Tue, Jan 06, 2009 at 06:15:32PM -0800, David Stainton wrote:
> Hi,
> 
> I'm seeing bad haproxy behavior... and I know there is a chance I'm
> using haproxy's health checking mechanism incorrectly.
> When a single node is turned off (e.g. apache is stopped or the server
> crashed) haproxy seems to think all the nodes are offline.
> I can stop apache in a node and watch the haproxy?stats url indicate
> various nodes are failing the healthcheck multiple times...
> and are marked offline.

that's rather strange.

> Any suggestions? Below is essentially my haproxy config with ip addr
> and dns changed.
> As you can see, my healthcheck url calls healthcheck.php so that we
> can test various things like db connectivity of the server...
> This php script return a 200 when everything is working and it is but
> haproxy thinks everything is not working
> when this test fails on a single node.
> 
> Anyway this is the worst possible bug for high availability. A single
> node brings down the whole pool!
> Am I doing something obviously wrong here?

I see nothing wrong in your configuration.
However, I have an idea. Does this happen under load only ? I mean, if one
of the three server goes down, the remaining two will see their load increase
by 50%. Apache has low limits on concurrent connections. Would it be possible
that the load transfer saturates the remaining servers, leading to failed
checks ? You can see this in the stats page by looking at the max sessions
reached on each server. If it ever goes as high as your Apache's MaxClient,
you're in trouble. But there's an easy fix for that, you can set a "maxconn"
parameter on those "server" lines, with a value slightly lower than MaxClients
so that haproxy will queue excess requests instead of saturating the server.

Aside this, your config looks pretty standard.

> Perhaps I should use a GET instead of a HEAD in the check...?

No, I don't see any reason for that. If that was the problem, your servers
would never show up.

Regards,
Willy


Reply via email to