Hi Dmitri,

On Mon, Nov 15, 2010 at 04:43:50PM -0800, Dmitri Smirnov wrote:
> Yes, this indeed happens. Also, the objective is not to exceed the 
> number of connections from squids to backend database. In case of cold 
> cache the redispatch will cause a cache entry to be brought into the 
> wrong shard which is unlikely to be reused.

That's a good point indeed. It just depends if it is important for you
that requests are processed. Some commercial caches are able to bypass
the cache when there's too much Wait-I/O. In this case, they simply
fetch from the origin server. In case of a reverse-proxy farm, this
can amplify the problem if the caches are there to limit the load on
the servers. But in an outgoing farm, this is a nice improvement.

> However, even this is an optimistic scenario. These cold cache 
> situations happen due to external factors like AWS issue, our home grown 
> DNS messed up (AWS does not provide DNS) and etc which causes not all of 
> the squids to be reported to proxies and messes up the distribution. 
> This is because haproxy is restarted after the config file is regenerated.
> 
> I have been thinking about preserving some of the distribution using 
> server IDs when the set of squids partially changes but that's another 
> story, let's not digress.

I also have similar plans. Basically the stop script would dump the server's
state in a file that haproxy would parse at startup to recover in the same
initial state (checks and weights).

> Thus even with redispatch enabled the other squid is unlikely to have 
> free connection slots because when it goes cold, most of them do.

OK.

> Needless to say, most of the other components in the system are also in 
> distress in case something happens on a large scale. So I choose the 
> stability of the system to be the priority even though some of the 
> clients will be refused service which happens to be the least of the evils.

Even there, there are possibilities (eg: detect the size of queues and
consider the service as unavailable and emit a redirect to a waiting page).

> Having slept on the problem I came up with a fairly simple idea which is 
> not perfect but I think does most of the bang for such a simple change.
> 
> It revolves around of adding a maxconn restriction for every individual 
> squid on in the backend.
> 
> And the number can be easily calculated and then tuned after a loadtest.
> 
> Let's assume I have 1 haproxy in front of a single squid.
> 
> Furthermore, HIT latency: 5ms, MISS latency 200ms for simplicity.
> 
> Incoming traffic 1,000 rps at peak.
> 
> From squids to backend lets have 50 connections max, i.e. with 250 rps max.
> 
> So through a single connection allowance for hot caches we will be able 
> to process 200 rps. For cold cache we will do only 5 rps.
> 
> This means that to support Hot traffic we need 5 connections at least.
> At the same time this will throttle MISS requests to max of 25.
> 
> Because we have 250 rps max at the backend we can raise maxconn to 50 
> for the squid. This creates a range of 250-10,000 rps.
> 
> As caches warm up the traffic becomes mixed and drifts towards the hot 
> model so the same number of connections will process more and more 
> requests until it reached 99.6% hit rate in our case.

You have described the exact use of the maxconn. I like it when users explain
it with their own words, they do it better than me :-)
Also there's something nice with the maxconn, it's also moderated by the slow
start, so it slowly ramps up at the same time as the weight. This is very
important for I/O-bound components such as caches, because the less concurrent
I/O you do on them, the faster they respond.

> I choose to leave a queue size unlimited but put a fast expiration time 
> on the queue entries so they are rejected with 503 unless you have other 
> recommendations.

Using the avg_queue ACL function, you can detect a global overload. For
instance, you could say that if you have more than 10 requests/server in
queue (on average), then the service is overloaded and you send it to a
specific farm. Note that the specific farm could very well be constituted
from the same servers, with an extremely low maxconn and a specific header
to tell squid to avoid disk caching (you might have to modify it a bit for
that). That could allow the service to slowly recover without returning
errors to users.

> I also choose to impose an individual maxconn rather than a backend 
> maxconn. This is to prevent MISS requests to use up all of the 
> connections limit and allow HITs to be served quickly from hot shards.
> I am still pondering over this point though.

That's really the principle of the the per-server maxconn. Be careful,
users tend to get addicted to it and to reduce it too much because the
lowest values show the best performance, but are not necessarily compatible
with slow sites where you'd prefer to have more concurrent connections.

You can also check the minconn+fullconn parameters, they make the maxconn
dynamic : if you have few connections on the backend, then the effective
server maxconn is low (close to minconn). If you have many connections on
the backend, the effective server maxconn is high (close to maxconn). It
reaches the configured maxconn when the backend's total conns equal fullconn.
This means fast response times for low loads, and higher processing capacity
at the expense of response time for high loads.

> The situation would be more complicated if the maxconn was too big for 
> MISS and too small for HIT but this is not the case.
> 
> The biggest problem remaining: clients stop seeing rejections when at 
> least 5 connections are available for HIT traffic. This means that MISS 
> traffic should be at 225 rps at the most, i.e. caches must be > 77% hot.

that's where the slowstart can help, because if the cache is cold, it will
reduce the maxconn, hence limit the risk of concurrent misses.

Regards,
Willy


Reply via email to