Hi David,

(warning, your mail agent sends ctrl-M at the end of each line, looks a bit 
broken).

On Mon, Oct 15, 2012 at 04:56:47PM +0200, David Touzeau wrote:
> Dear^M
> ^M
>  ^M
> I have seen in Haproxy that you can use it in transparent mode.^M
> ^M
> I would like to use it in order to load balance Squid cache servers in 
> transparent mode.^M
> ^M
> Is there somebody had implemented this kind of architecture ?^M
> ^M

I remember someone posting on this subject a few years ago. From what I
recall, in transparent mode squid only uses the Host header to route the
request so there are always possibilities. There are two possible
architectures in fact :

  - haproxy and squid on the same machine. haproxy must then listen in
    transparent mode and intercept incoming connections. It should forward
    them to squid explicitly (not transparent). Squid should be configured
    for transparent mode but should not intercept connections.

  - haproxy and squid on different machines : haproxy could work in full
    transparent mode on both sides and even connect to squid using the
    client's address so that squid ACLs and logs continue to work. However
    it needs a way to distinguish the squid proxies. Either it targets them
    directly using their own IP address, or it forwards the traffic to the
    original address but then needs a different interface or VLAN to address
    each server (Note: I wouldn't recomend this last option, it's only
    meaningful when you want to directly plug your proxies to the LB box).

So in general you're left with something looking like this :

            C->W                C->S             C/S -> W
  client ----------> haproxy ----------> squid ------------> web
   [C]                 [H]                [S]                [W]

Note that having haproxy spof the client's address means that the squids must
route the client traffic through the haproxy server. Sometimes it is difficult
eg if you're setting up this architecture on an ISP backbone because there are
so many routes that it basically means the squids have their default routes
through haproxy.

Similarly, if squid spoofs the client's address when connecting to the web,
the return traffic from the web must be routed via squid.

This becomes tricky when you want to ensure that some protocols bypass
squid (eg: websocket, https) or when you want a cache bypass rule for
the case where all squids are down. This is still possible but once again
it means the squids need to route their traffic through haproxy. It would
then more likely look like this :

                       [S]
                      squid
                        |
                   C->S | S->W
            C->W        |        C->W
  client ----------> haproxy ----------> web
   [C]                 [H]               [W]

Then depending on the load, you might need a big machine for haproxy so that
it can support all the cumulated squid traffic. In this architecture, squid
must not spoof the client's address otherwise it becomes impossible to find
the right squid gateway for the return traffic. Instead, squid must connect
to the web using its own IP address and present an x-forwarded-for header
that haproxy will use to connect to the web. It would more or less look like
this :

    listen client
        # balance between squids
        bind :80 interface eth0 transparent
        mode http
        option http-server-close
        balance uri
        hash-type consistent
        reqidel ^x-forwarded-for
        option forwardfor
        source 0.0.0.0 usesrc clientip
        server squid1 192.168.1.1:3128 check
        server squid2 192.168.1.2:3128 check
        server squid3 192.168.1.3:3128 check
        server squid4 192.168.1.4:3128 check
        server direct 0.0.0.0 backup  # reach original site

    listen squid
        # spoof client's address to connect outside
        bind :80 interface eth1 transparent
        mode http
        source 0.0.0.0 usesrc hdr(x-forwarded-for)
        server internet 0.0.0.0

With some appropriate tweaking, such a configuration should work. Note
that since the client's address is present everywhere in the chain, it
is even possible to load balance between multiple haproxies using LVS
based on the source on the left side, and the destination on the right
side.

Regards,
Willy


Reply via email to