Hi Joseph,
On Mon, Mar 02, 2009 at 05:16:30PM -0500, Joseph Hardeman wrote:
> Hi everyone,
>
> I just experienced again a check on the health of haproxy for one of our
> clients which forced a failover to our backup haproxy system. I am
> hoping someone has something to help with this. From looking at the
> documentation it states that the mode health will not log anything if
> logging is turned on. I am not finding anything in the system logs that
> would indicate why haproxy would not respond which caused Nagios to send
> the ha_standby to the system to fail it over. And I did not find
> anything on the Monitoring system to indicate that it had a problem
> either.
>
> As this was the primary system I did not have logging turned on, I do
> now and tonight when traffic is lower I will fail it back over to the
> primary.
>
> Here is my health check section in haproxy, we are running version
> 1.3.15.7 running on Centos x86_64 kernel 2.6.18-92.1.10.el5:
>
> listen health_check 0.0.0.0:60000
> mode health
>
> Nagios Log entry for this event:
>
> [1236025204] SERVICE ALERT: haproxy1;HAPROXY_HTTP;CRITICAL;HARD;1;HTTP
> CRITICAL - No data received from host
>
> The failover happened at D/T: 02-03-2009 15:20:05
>
> Any ideas or help would be very much appreciated.
"Mode health" is processed very early, immediately after the
connection is accepted. So if it fails, I see two possible
reasons for this :
- your system's backlog queue is sometimes full, thus preventing
nagios's connection from reaching haproxy at all. This is clearly
the most plausible scenario (eg: during an attack) ;
- nagios's plugin expects something which looks more like HTTP
on this socket. The "mode health" is very dirty and does not
even wait for the request to come in, so if nagios for any
reason takes a little bit of time between the instant it
connects to haproxy and the instant it sends the request, it
might face an already closed connection.
You can improve on the last point by switching to "monitor-uri",
which is performed after HTTP parsing (so it waits for the request)
and which will be 100% HTTP compliant. You can set it either on
the public frontend or on a dedicated one as above. For instance :
frontend health_check 0.0.0.0:60000
mode http
timeout client 5s
monitor-uri /
I should update the doc to discourage people from using "mode health"
as time has proven that it does not provide a very reliable mechanism
with some monitoring tools.
Regards,
Willy