This is also an issue for us (see my post from a few days ago) - on
HAProxy's first start, most hosts are marked DOWN with a Layer4 timeout,
even though they are fine, because there are a large number of them.

Some workaround or more forgiving initial health check would be useful here.


----
Kevin Burke | 415-723-4116 | www.twilio.com


On Tue, Jan 28, 2014 at 8:13 AM, Patrick Hemmer <[email protected]>wrote:

>  *From: *Willy Tarreau <[email protected]> <[email protected]>
> *Sent: * 2014-01-25 05:45:11 E
> *To: *Patrick Hemmer <[email protected]> <[email protected]>
> *CC: *Malcolm Turnbull <[email protected]><[email protected]>,
> [email protected] <[email protected]> <[email protected]>
> *Subject: *Re: Just a simple thought on health checks after a soft reload
> of HAProxy....
>
>  On Tue, Jan 21, 2014 at 09:04:12PM -0500, Patrick Hemmer wrote:
>
>  Personally I would not like that every server is considered down until
> after the health checks pass. Basically this would result in things
> being down after a reload, which defeats the point of the reload being
> non-interruptive.
>
>  I can confirm, we had this in a very early version, something like 1.0.x
> and it was quickly changed! I've been using Alteon load balancers for
> years and their health checks are slow. I remember that the persons in
> charge for them were always scared to reboot them because the services
> remained down for a long time after a reboot (seconds to minutes). So
> we definitely don't want this to happen here.
>
>
>  I can think of 2 possible solutions:
> 1) When the new process comes up, do an initial check on all servers
> (just one) which have checks enabled. Use that one check as the verdict
> for whether each server should be marked 'up' or 'down'.
>
>  Till now that's exactly what's currently done. The servers are marked
> "almost dead", so the first check gives the verdict. Initially we had
> all checks started immediately. But it caused a lot of issues at several
> places where there were a high number of backends or servers mapped to
> the same hardware, because the rush of connection really caused the
> servers to be flagged as down. So we started to spread the checks over
> the longest check period in a farm.
>
>
> Is there a way to enable this behavior? In my environment/configuration,
> it causes absolutely no issue that all the checks be fired off at the same
> time.
> As it is right now, when haproxy starts up, it takes it quite a while to
> discover which servers are down.
>
> -Patrick
>

Reply via email to