Hello,
As the number of haproxy deployments (>20) grows in our infrastructure along
with an increase in the number of backends ~1500 we are beginning to
see a non trivial resources allocated to health checks. Each proxy instance
health checking each backend every 2 seconds.
In an earlier conversation with Willy I was directed to look into the options
fastinter and on-error configuration options. I have done this but wanted to
speak about how others might have addressed this and if there was any
interest in implementing something along these lines and gather ideas/comments
on what such an implementation would look like.
We use haproxy as a http load balancer and I have not given any thought
about how the following description applies to tcp mode.
Currently we http check our backends using
option httpchk GET /_check.php HTTP/1.1\r\nHost:\ www.domain.com
We were considering adding an additional directive to specify a check server
in addition to the httpchk directive
option httpchk GET /_health.php HTTP/1.1\r\nHost:\ hdr(Host)
option chksrv server hcm-008dad0f 172.16.114.52:80
The change would add a dynamic field to the health check request.
hdr(Host) (http host header in this instance) is the field used to communicate
the server to be health checked to the external check server.
The check server can/will be implemented to cache health check responses from
the back ends.
One of the justifications for implementing this is the need in my
environment to take
into consideration factors not available to the backends when
responding to a health
check. As an example we will be implementing in our check server
ability to force
success/failure of health checks on groups of backends related in some manner.
We expect this to allow us to avoid brown out scenarios we have
encountered in the past.
Has anyone considered/achieved something along these lines, or have suggestions
on how we could implement the same?
Thanks
Bhaskar