Hi,

On Mon, Aug 02, 2010 at 09:05:02AM -0700, Rich Rauenzahn wrote:
> I'm using haproxy ("balance uri") inside an intranet to direct traffic
> to 4 squid servers in order to cache content normally served directly
> by our web server.   This web server serves large files (ranging from
> 10's of MB to several GB)
> 
> I'm worried that our haproxy server could be a network bottleneck (the
> NIC, not the software)

Yes it could become a bottleneck, but everything depends on your
traffic. A large site I know runs at 10 Gbps 24x7 on 3 haproxy
machines. When 1 is in maintenance, that means 5 Gbps per haproxy,
and it does not even saturate one core. That means that the NIC is
used to 1/3 to 1/2 of its potential and the CPU is even less used.
I don't know if you have a higher traffic, but there are several
ways to scale, and the easiest one is to stack layer 4 + layer 7
LBs :

 - 1 layer of L3/4 LBs which only see requests and not responses.
   Typically this is LVS in DR mode. This scales extremely well.
   You only need 1 (two for HA).

 - 1 layer of L7 LBs which handle both requests and responses. Just
   build an N+1 setup to support a failure.

The advantage is that the first layer does maximum randomization and
ensures very smooth distribution on the load at the second layer. The
second layer uses URI hashing to find the best cache. That way you get
the best of L4 and L7 : total scalability + URL-awareness.

> and am wondering if there is a way to use an
> http redirect instead of passthrough -- then the actual traffic could
> come directly (and only) from the back end squid server and not have
> to also pass through the haproxy NIC.

Yes, you can do that by specifying "redir" on the "server" lines.
Haproxy will then send a 302 to the client with the IP:port of the
server and the same URI for GET/HEAD requests. POSTs still pass
through haproxy. This is particularly useful for multi-site LB,
because it only sends redirets for server that are known to be up,
and it applies the correct LB algorithm. However, your servers have
to be able to get direct requests from clients.

> I have a feeling from browsing the docs that haproxy just isn't
> intended to be used in this kind of model.

it is :-)

> Is it possible to do this?  Should I be using a different load
> balancer?  Or does this kind of redirection have a nasty side effect I
> haven't thought of yet?

The only thing I can think of is that when performing such a redirect,
the client will put the server's IP (or name) in the "Host" header,
which means that the site name must be deduced from something else.
For many setups this is not a problem, but this is still something to
keep in mind.

Regards,
Willy


Reply via email to