Hi,
We have been using haproxy for a few months now and the benefits have
been immense. This list in particular is an indispensable resource.
We use it in the cloud to consistently distribute requests among the
squids using haproxy 1.4.8.
We run N proxies in front of M squids in different availability zones
with the same configuration.
It also shields the clients from the volatile nature of amazon instances
behind the proxies as proxies instantly redispatch requests when squids
do down.
By doing this we, of course, loose a portion of the cache but it is
acceptable when only 1 or 2 squids are out.
This brings me to the biggest challenge we have currently and that is of
a cold or mostly cold cache.
There is a drastic difference in the performance characteristics for the
system when caches are cold and when they are hot.
While we are hot we serve about 6,000-10,000 rps, queues to backends are
zero, the number of concurrent connections to backends is near zero.
This boils down to a range 600 - 1,000 rps per haproxy instance for 10
instances.
With cold caches or when the distribution is thrown off the latencies
shoot up 2-3 orders of magnitude and the number of concurrent
connections to squids goes up to hundreds. This lead to a flood client
retries( being fixed now) often maxing out the number of sockets leading
haproxy to believe that squids are not reachable and marking them down
(flip/flop). This led to redispatches making the picture even worse.
The limiting factor here is a latency and the number of concurrent
persistent connections that can be established back to the database from
squids which I believe to be around 500.
Naturally, we have caches here in the first place for the reason.
This problem will require some time to address and is being actively
worked on.
While there are some fundamental problems here to work on, I was
wondering if I could quickly tweak haproxies configuration to gracefully
support both modes of operation in a short term since it is currently
the only place in a chain where a powerful scripting can be done.
The objectives are:
1) allow maximum possible throughput when caches are hot
2) When caches are cold sustain a level of throughput that will allow
caches to warm up w/o melting the system down.
3) detect a slowdown by checking or/and
- avg_queue size
- queue size
- number of concurrent connections going up
4) Quickly reject requests that come beyond predetermined cold cache
capacity. If possible, do it on an individual server level rather than
on a backend level ( for cases when only some caches are cold).
One of the issues here is that if I specify maxconn for the individual
server, the connection is not rejected but goes to a queue. If I limit
the queue size then when timeout expires it will redispatch to another
server. I want re-dispatches only when a squid is down.
Below is a version of config under construction and somewhat simplified.
I will work out the exact numbers later. Right now server maxconn,
slowstart timeout and queue threshold is a pure speculation.
I would appreciate any help as I am trying to wrap my brain around a lot
of variables here and available tuning knobs.
--
Dmitri Smirnov
# This is a CE haproxy test config boilerplate
global
daemon
stats socket /apps/haproxy/var/stats level admin
maxconn 10000
defaults
mode http
balance uri
hash-type consistent
# local0 needs to be configured at /etc/syslog.conf
log /dev/log local0
option httplog
# Maximum number of concurrent connections on the frontend
# set to be the half of the total max in the global section above
maxconn 5000
# timeout client is the max time of client inactivity
# when the client is expected to ack or send data
# we do not want to tie up for long time
timeout client 100ms
# This is a max time to wait for connection to a server to succeed
timeout connect 200ms
# This is a maximum timeout to wait in a queue at the backend
# by default it is the same as timeout connect but we set it explicitely
# Below we do not allow the queue to grow beyond 1 as this indicates
that servers
# are slow and overloaded.
timeout queue 200ms
# Maximum inactivity timeout for the server to ack or send data
# In other words, in situtions of meltdown we are not going to wait for
slow data to come back ( not what is currently in prod)
# but this will still hopefully allow squid to refill
# max time is usually less than a second
timeout server 1000ms
frontend http-in
bind *:8080
default_backend servers
# Problem, if one squid is cold this reject requests for the whole farm
acl q_too_long avg_queue(servers) gt 0
use_backend overload if q_too_long
backend overload
# HAproxy will issue 503 because no servers available for this backend
# Here we customize the response
errorfile 503 /apps/haproxy/etc/fe_503.http
backend servers
stats enable
stats uri /haproxy?status
stats refresh 5s
stats show-legends
stats show-node
option forceclose
option forwardfor
# Redispatch if the destination server is down. This option will also
# redispatch if a queue timeout expired. However, we do not want
# to redispatch in that case.
option redispatch
retries 1
# Dynamically generated section follows.
# Example
server ec2-XXXX ec2-XXXX.compute-1.amazonaws.com:8080 check inter 1000
rise 5 fall 3 maxconn 20 slowstart 30s