Hi,

We have been using haproxy for a few months now and the benefits have been immense. This list in particular is an indispensable resource.

We use it in the cloud to consistently distribute requests among the squids using haproxy 1.4.8.

We run N proxies in front of M squids in different availability zones with the same configuration.

It also shields the clients from the volatile nature of amazon instances behind the proxies as proxies instantly redispatch requests when squids do down.

By doing this we, of course, loose a portion of the cache but it is acceptable when only 1 or 2 squids are out.

This brings me to the biggest challenge we have currently and that is of a cold or mostly cold cache.

There is a drastic difference in the performance characteristics for the system when caches are cold and when they are hot.

While we are hot we serve about 6,000-10,000 rps, queues to backends are zero, the number of concurrent connections to backends is near zero. This boils down to a range 600 - 1,000 rps per haproxy instance for 10 instances.

With cold caches or when the distribution is thrown off the latencies shoot up 2-3 orders of magnitude and the number of concurrent connections to squids goes up to hundreds. This lead to a flood client retries( being fixed now) often maxing out the number of sockets leading haproxy to believe that squids are not reachable and marking them down (flip/flop). This led to redispatches making the picture even worse.

The limiting factor here is a latency and the number of concurrent persistent connections that can be established back to the database from squids which I believe to be around 500.
Naturally, we have caches here in the first place for the reason.
This problem will require some time to address and is being actively worked on.

While there are some fundamental problems here to work on, I was wondering if I could quickly tweak haproxies configuration to gracefully support both modes of operation in a short term since it is currently the only place in a chain where a powerful scripting can be done.

The objectives are:

1) allow maximum possible throughput when caches are hot

2) When caches are cold sustain a level of throughput that will allow caches to warm up w/o melting the system down.

3) detect a slowdown by checking or/and
- avg_queue size
- queue size
- number of concurrent connections going up

4) Quickly reject requests that come beyond predetermined cold cache capacity. If possible, do it on an individual server level rather than on a backend level ( for cases when only some caches are cold).

One of the issues here is that if I specify maxconn for the individual server, the connection is not rejected but goes to a queue. If I limit the queue size then when timeout expires it will redispatch to another server. I want re-dispatches only when a squid is down.

Below is a version of config under construction and somewhat simplified.

I will work out the exact numbers later. Right now server maxconn, slowstart timeout and queue threshold is a pure speculation.

I would appreciate any help as I am trying to wrap my brain around a lot of variables here and available tuning knobs.

--
Dmitri Smirnov

# This is a CE haproxy test config boilerplate
global
    daemon
    stats socket /apps/haproxy/var/stats level admin
    maxconn 10000

defaults
    mode http
    balance uri
    hash-type consistent
# local0 needs to be configured at /etc/syslog.conf
    log /dev/log local0
    option httplog

# Maximum number of concurrent connections on the frontend
# set to be the half of the total max in the global section above
    maxconn 5000

# timeout client is the max time of client inactivity
# when the client is expected to ack or send data
# we do not want to tie up for long time
    timeout client  100ms

# This is a max time to wait for connection to a server to succeed
    timeout connect 200ms

# This is a maximum timeout to wait in a queue at the backend
# by default it is the same as timeout connect but we set it explicitely
# Below we do not allow the queue to grow beyond 1 as this indicates that servers
# are slow and overloaded.
    timeout queue 200ms

# Maximum inactivity timeout for the server to ack or send data
# In other words, in situtions of meltdown we are not going to wait for slow data to come back ( not what is currently in prod)
# but this will still hopefully allow squid to refill
# max time is usually less than a second
    timeout server 1000ms

frontend http-in
   bind *:8080
   default_backend servers

# Problem, if one squid is cold this reject requests for the whole farm
   acl q_too_long avg_queue(servers) gt 0
   use_backend overload if q_too_long

backend overload
# HAproxy will issue 503 because no servers available for this backend
# Here we customize the response
    errorfile 503 /apps/haproxy/etc/fe_503.http

backend servers
    stats enable
    stats uri     /haproxy?status
    stats refresh 5s
    stats show-legends
    stats show-node
    option forceclose
    option forwardfor

# Redispatch if the destination server is down. This option will also
# redispatch if a queue timeout expired. However, we do not want
# to redispatch in that case.
    option redispatch
    retries 1

# Dynamically generated section follows.
# Example
server ec2-XXXX ec2-XXXX.compute-1.amazonaws.com:8080 check inter 1000 rise 5 fall 3 maxconn 20 slowstart 30s


Reply via email to