Re: Re: hashing + roundrobin algorithm

wsq003 Mon, 28 Nov 2011 21:53:07 -0800

We add a new keyword 'vgroup' under 'server' key word.
    server wsqa 0.0.0.0 vgroup subproxy1 weight 32 check inter 4000 rise 3 fall 
3 
means request assigned to this server will be treated as set backend 
'subproxy1'. Then in backend 'subproxy1' you can configure any load balance 
strategy. This can be recursive.

In source code:
     At the end of assign_server(), if we found that a server has 'vgroup' 
property, we will set backend of cur_proxy and call assign_server() again.

From: Rerngvit Yanggratoke
Date: 2011-11-26 08:33
To: wsq003
CC: Willy Tarreau; haproxy; Baptiste
Subject: Re: Re: hashing + roundrobin algorithm
Hello wsq003,
       That sounds very interesting. It would be great if you could share your 
patch. If that is not possible, providing guideline on how to implement that 
would be helpful as well. Thank you!

2011/11/23 wsq003 <wsq...@sina.com>

I've made a private patch to haproxy (just a few lines of code, but not 
elegant), which can support this feature. 

My condition is just like your imagination: consistent-hashing to a group then 
round-robin in this group.

Our design is that several 'server' will share a physical machine, and 'severs' 
of one group will be distributed to several physical machine.
So, if one physical machine is down, nothing will pass through the cache layer, 
because every group still works. Then we will get a chance to recover the 
cluster as we want.

From: Willy Tarreau
Date: 2011-11-23 15:15
To: Rerngvit Yanggratoke
CC: haproxy; Baptiste
Subject: Re: hashing + roundrobin algorithm
Hi,

On Fri, Nov 18, 2011 at 05:48:54PM +0100, Rerngvit Yanggratoke wrote:
> Hello All,
>         First of all, pardon me if I'm not communicating very well. English
> is not my native language. We are running a static file distribution
> cluster. The cluster consists of many web servers serving static files over
> HTTP.  We have very large number of files such that a single server simply
> can not keep all files (don't have enough disk space). In particular, a
> file can be served only from a subset of servers. Each file is uniquely
> identified by a file's URI. I would refer to this URI later as a key.
>         I am investigating deploying HAProxy as a front end to this
> cluster. We want HAProxy to provide load balancing and automatic fail over.
> In other words, a request comes first to HAProxy and HAProxy should forward
> the request to appropriate backend server. More precisely, for a particular
> key, there should be at least two servers being forwarded to from HAProxy
> for the sake of load balancing. My question is what load
> balancing strategy should I use?
>         I could use hashing(based on key) or consistent hashing. However,
> each file would end up being served by a single server on a particular
> moment. That means I wouldn't have load balancing and fail over for a
> particular key.

This question is much more a question of architecture than of configuration.
What is important is not what you can do with haproxy, but how you want your
service to run. I suspect that if you acquired hardware and bandwidth to build
your service, you have pretty clear ideas of how your files will be distributed
and/or replicated between your servers. You also know whether you'll serve
millions of files or just a few tens, which means in the first case that you
can safely have one server per URL, and in the later that you would risk
overloading a server if everybody downloads the same file at a time. Maybe
you have installed caches to avoid overloading some servers. You have probably
planned what will happen when you add new servers, and what is supposed to
happen when a server temporarily fails.

All of these are very important questions, they determine whether your site
will work or fail.

Once you're able to respond to these questions, it becomes much more obvious
what the LB strategy can be, if you want to dedicate server farms to some
URLs, or load-balance each hash among a few servers because you have a
particular replication strategy. And once you know what you need, then we
can study how haproxy can respond to this need. Maybe it can't at all, maybe
it's easy to modify it to respond to your needs, maybe it does respond pretty
well.

My guess from what you describe is that it could make a lot of sense to
have one layer of haproxy in front of Varnish caches. The first layer of
haproxy chooses a cache based on a consistent hash of the URL, and each
varnish is then configured to address a small bunch of servers in round
robin. But this means that you need to assign servers to farms, and that
if you lose a varnish, all the servers behind it are lost too.

If your files are present on all servers, it might make sense to use
varnish as explained above but which would round-robin across all servers.
That way you make the cache layer and the server layer independant of each
other. But this can imply complex replication strategies.

As you see, there is no single response, you really need to define how you
want your architecture to work and to scale first.

Regards,
Willy

-- 
Best Regards,
Rerngvit Yanggratoke

Re: Re: hashing + roundrobin algorithm

Reply via email to