On 2011-09-01 22:15, Julien Vehent wrote:

On Thu, 01 Sep 2011 03:09:56 +0200, Krzysztof Olędzki wrote:

Your concern is very valid and I think this is a moment where you
should take advantage of HAProxy, so you came to the right place. ;)
Each active session on HAProxy does not cost too much (much less than
on http server), so you may use "http-server-close" mode. You will
provide keep-alive to clients and only to clients - http requests
between LB and http server(s) will be handled without keep-alive.

HAproxy also gives you possibility to transparently distribute
requests (and load) between two or more servers, without additional
dns records.


As a matter of fact, I've been around for some time :) We are already
using haproxy but not as a web front end. Our architecture will look
like that once I had the secondary lighttpd

                                       +-----------+
           +--------+             +---->  tomcat    |
+-------->lighttpd+             |    +-----------+
   keep    +--------|             |    +-----------+
   alive            +->+----------+---->  tomcat    |
                       |haproxy   +    +-----------+
           +--------+->+----------|    +-----------+
+-------->lighttpd|             |---->  tomcat    |
   keep    +--------+             |    +-----------+
   alive                          |    +-----------+
                                  +---->  tomcat    |
                                       +-----------+

We do not do any keepalive passed lighttpd. Everything is non-keepalive
because we need the x-forwarded-for header on each request for tomcat.
(that's yet another story)

With http-server-close you will still get a proper x-forwarded-for header.

Anyway, the reason why I don't want to put haproxy in front of
everything right now is because we have a shit load of rewrite rules and
it will take me forever to convert all of that into haproxy acl's
syntax.

Sure, but passing the same traffic over several proxies looks very suboptimal to me. Especially that HAProxy was designed to be a proxy and lighthttp was not.

Plus, whether it's lighttpd or haproxy, the memory impact on the kernel
is going to be the same.

Kernel - yes, userspace - not. Remember that proxy needs some memory for each active connection. And this is much, much more memory that is required by a kernel.

Right now, I'm considering putting a keepalive timeout at 5 seconds.
Maybe even less.

From my pov 5s for keepalive looks very reasonable.

As a side question: do you know where I can find the information
regarding connection size and conntrack size in the kernel ? (other
than
printf size_of(sk_buff) :p).

Conntrack has nothing to do with sk_buff. However, you are able to
find this information with for example:
# egrep "(#|connt)" /proc/slabinfo


Very nice ! I'll read about slabinfo. Here is the result from the
current lighttpd server

# name<active_objs>  <num_objs>  <objsize>  <objperslab>
<pagesperslab>  : tunables<limit>  <batchcount>  <sharedfactor>  : slabdata
<active_slabs>  <num_slabs>  <sharedavail>
ip_conntrack_expect      0      0    136   28    1 : tunables  120   60
    8 : slabdata      0      0      0
ip_conntrack       23508  39767    304   13    1 : tunables   54   27
8 : slabdata   3059   3059     15

304 bytes? Which kernel version?

It should be around 208 bytes on x86 and 264 bytes on x86_64 (2 x
longer pointers), but this is not all. Each conntrack can have some
additional data attached, which is known as "extends". Currently
there
are 5 possible extends:
  - helper - struct nf_conn_help: 16 bytes (x86) / 24 bytes (x86_64)
  - nat - struct nf_conn_nat: 16 bytes (x86) / 24 bytes (x86_64)
  - acct - struct nf_conn_counter: 16 bytes
  - ecache - struct nf_conntrack_ecache: 16 bytes
  - zone - struct nf_conntrack_zone: 2 bytes

So, as you can see, in the worst case there can be 66 / 82 more bytes
allocated with each conntrack and this goes into kmalloc-x slab that
rounds it into 2^n bytes.


That's for conntrack only ?  This is pretty low (346 bytes max per
connection), I thought conntrack would consume more than that.

No, conntracks are rather cheap and a decent hardware is able handle even 500K without much trouble. However, as they are hashed and collected into buckets you may waste much more memory that a number of conntracks migh indicate. Especially that with big load you really need to bumb nf_conntrack.hashsize not to saturate your CPU.

What about the sk_buff and other structure? I didn't dive in the
networking layer for some time. I don't need an exact number, but just
an idea of how much memory we are talking about.

Size of sk_buff depends on MTU, driver used and several other factors. With MTU=1500 you typically are able to fit into 2048 or 4096 bytes.

Best regards,

                        Krzysztof Olędzki

Reply via email to