Hello all, === Background material, skip this if you just want to know what I've done ===
For some time I've been wishing for a way to combine the best aspects of consistent hashing and "balance leastconn". I use haproxy to distribute traffic across a cluster of servers doing dynamic video delivery. Without going into too much detail, there's a substantial cache effect — there's a bunch of information that needs to be calculated for each video, and if that information is cached, then the request is quite a bit faster and uses less resources. Also, the server cluster runs in the cloud and auto-scales based on load. Initially our approach was to cache the information locally on each server, and use uri-based balancing with consistent hashing to distribute the traffic among servers according to video ID. This gave a satisfying number of cache hits, but had the problem that a single server could become overloaded by receiving the traffic for one or two exceptionally popular videos, resulting in bad performance. Our second whack was to change to "balance leastconn" (which destroys the cache hit ratio) and add a second layer of shared cache. This gives more consistent performance, but the bandwidth used for cache coherency is substantial, and the cache servers create a single point of failure. What I really wanted was a policy that uses consistent hashing most of the time, but at the same time, avoids overloading any given server, by diverting traffic from an overloaded server to a less-loaded server, in a consistent order based on the consistent hash ring. I did some experimenting with algorithms, but didn't find a satisfactory one. === End of background material, here's the paper and the patch === Recently I came upon the paper "Consistent Hashing with Bounded Loads", http://arxiv.org/abs/1608.01350 . It describes a very simple algorithm for doing exactly what I wanted. It defines a parameter c > 1, and enforces that no server receives more than ceil(c * avg(conns)) connections. If a server is already at the limit, it goes around the hash ring until it finds one that is below the limit, and assigns the connection to that server. A large "c" provides stricter hashing (a greater chance of requests with the same hash key going to the same server), while a small "c" provides stricter balancing (less difference between the smallest and largest number of connections to a server). I simulated this algorithm, found that it would work better than the ones I tested before, and then implemented it in HAProxy. On testing in production I saw reduced 95th-percentile response times, and a great reduction in cache-coherency traffic. I'm attaching the patch that I tested. However, I don't expect that it is in any way mergeable. I'm not very familiar with the HAProxy codebase and I did a number of things that are almost certainly wrong. I replaced the existing chash_get_server_hash instead of attempting to add a new method or flag to the configuration. The "c" parameter is hard-coded instead of configurable, as haproxy currently has no awareness of non-integer numbers, and I wasn't sure how to deal with that. My code for enumerating servers and assigning capacities is probably suboptimal, although it didn't cause a noticeable bump in CPU usage on my production systems. There are probably any number of coding-style gaffes. And there is at least one bug that causes haproxy to become unresponsive when servers are marked down. That being said, I would be thrilled to work with someone to repair all of these problems, if there is interest in having this feature in HAProxy. I think that it's an exciting feature and it would be really great to add to HAProxy's already excellent set of load-balancing tools. I just need a little guidance to make it happen. My changes are on Github at https://github.com/arodland/haproxy and they're up-to-date against haproxy master. I'd encourage anyone to review the code, and to contact me by email, on IRC (I'm on #haproxy as hobbs), or on GitHub. Thank you, Andrew

