Hi David,
(warning, your mail agent sends ctrl-M at the end of each line, looks a bit
broken).
On Mon, Oct 15, 2012 at 04:56:47PM +0200, David Touzeau wrote:
> Dear^M
> ^M
> ^M
> I have seen in Haproxy that you can use it in transparent mode.^M
> ^M
> I would like to use it in order to load balance Squid cache servers in
> transparent mode.^M
> ^M
> Is there somebody had implemented this kind of architecture ?^M
> ^M
I remember someone posting on this subject a few years ago. From what I
recall, in transparent mode squid only uses the Host header to route the
request so there are always possibilities. There are two possible
architectures in fact :
- haproxy and squid on the same machine. haproxy must then listen in
transparent mode and intercept incoming connections. It should forward
them to squid explicitly (not transparent). Squid should be configured
for transparent mode but should not intercept connections.
- haproxy and squid on different machines : haproxy could work in full
transparent mode on both sides and even connect to squid using the
client's address so that squid ACLs and logs continue to work. However
it needs a way to distinguish the squid proxies. Either it targets them
directly using their own IP address, or it forwards the traffic to the
original address but then needs a different interface or VLAN to address
each server (Note: I wouldn't recomend this last option, it's only
meaningful when you want to directly plug your proxies to the LB box).
So in general you're left with something looking like this :
C->W C->S C/S -> W
client ----------> haproxy ----------> squid ------------> web
[C] [H] [S] [W]
Note that having haproxy spof the client's address means that the squids must
route the client traffic through the haproxy server. Sometimes it is difficult
eg if you're setting up this architecture on an ISP backbone because there are
so many routes that it basically means the squids have their default routes
through haproxy.
Similarly, if squid spoofs the client's address when connecting to the web,
the return traffic from the web must be routed via squid.
This becomes tricky when you want to ensure that some protocols bypass
squid (eg: websocket, https) or when you want a cache bypass rule for
the case where all squids are down. This is still possible but once again
it means the squids need to route their traffic through haproxy. It would
then more likely look like this :
[S]
squid
|
C->S | S->W
C->W | C->W
client ----------> haproxy ----------> web
[C] [H] [W]
Then depending on the load, you might need a big machine for haproxy so that
it can support all the cumulated squid traffic. In this architecture, squid
must not spoof the client's address otherwise it becomes impossible to find
the right squid gateway for the return traffic. Instead, squid must connect
to the web using its own IP address and present an x-forwarded-for header
that haproxy will use to connect to the web. It would more or less look like
this :
listen client
# balance between squids
bind :80 interface eth0 transparent
mode http
option http-server-close
balance uri
hash-type consistent
reqidel ^x-forwarded-for
option forwardfor
source 0.0.0.0 usesrc clientip
server squid1 192.168.1.1:3128 check
server squid2 192.168.1.2:3128 check
server squid3 192.168.1.3:3128 check
server squid4 192.168.1.4:3128 check
server direct 0.0.0.0 backup # reach original site
listen squid
# spoof client's address to connect outside
bind :80 interface eth1 transparent
mode http
source 0.0.0.0 usesrc hdr(x-forwarded-for)
server internet 0.0.0.0
With some appropriate tweaking, such a configuration should work. Note
that since the client's address is present everywhere in the chain, it
is even possible to load balance between multiple haproxies using LVS
based on the source on the left side, and the destination on the right
side.
Regards,
Willy