Hi Dmitriy,

On Tue, Jul 19, 2011 at 04:25:29AM +0400, Dmitriy Samsonov wrote:
> Using one/two clients with ab2 -c 250 -H 'Accept-Encoding: None' -n
(...)
>   98%     11
>   99%     13
>  100%   9022 (longest request)

Still some drops. Was this before or after ethtool+somaxconn ?

> Also I've upgraded kernel from 2.6.38-r6 (gentoo) to 2.6.39-r3
> (gentoo) - nothing changed. At all. haproxy version is 1.4.8.

OK. When you have time, you can also update haproxy. No less than
44 bugs were fixed since 1.4.8, and a few improvements were made,
eventhough those ones will not concern your current tests. BTW, I'm
seeing that your system is 64-bit, I assume you built haproxy for
64-bit too ? I'm asking because syscalls are cheaper in 64-bit.

> Altering somaxconn also did change anything.

OK. This one could be increased too (eg 10x) :
   net.core.netdev_max_backlog = 1000

Also in your sysctls, I see :
   net.ipv4.tcp_max_syn_backlog = 1024

I thought it was set to something like 10000. 1024 can be a bit low
for 50k sess/s.

One thing you can try for the test, but not for production until the
kernel it checked for correct processing is the "defer-accept" option
of the "bind" line :

   frontend xxx
        bind :80 defer-accept

It asks the kernel to wake haproxy up only when data are available on
the port. It can save a few kernel-user switches. But until recently
it was not reliably usable for production because there was no way to
tell the kernel to forward the connection after some delay. It has
recently changed but I don't know at what version. If it sensibly
improves performance, we can try to find the correct version.

I'm seeing you have the bonding driver loaded. Is is in use, and if so,
how is the traffic spread on the links ? You should never use round robin
nor any non-determinist solution for high packet rates, it's the best way
to cause reordering that costs a lot on both the client and the server TCP
stacks. It also tends to cause them to emit more useless ACKs. If you want
to use two NICs for double data rate, you'd better bind two frontends each
to one NIC and have the front router route the traffic to both. If you only
have a switch, then sometimes you can give them the same IP+MAC and enable 
etherchannel on the switch. This will also bring you the ability to run 2
procs, each bound to one NIC and use 4 cores total :
  - 1 for each NIC 
  - 1 for each haproxy

> Only changing affinity of irq/haproxy is affecting system, maximum
> rate changes from 25k to 49k. And that's it...

OK. As you see, it's very important to play with this, because by default
the system will move haproxy to the core processing the traffic, but at
such rates, both the kernel and haproxy need their own core.

> I'm including sysctl -a output, but I think all this happens because
> of some troubles with bnx2 driver - I just don't see any explanation
> why 70-80Mbs are saturating haproxy and irq handling (lost packets!).

You should not consider that in Mbps but in connection rates and packet
rates, as those are just small packets.

> I have an option to try 'High Performance 1000PT Intel Network Card'
> could it be any better or I should try find solution for current
> configuration?

The intel NIC generally is the best tradeoff between perf and stability.
You only have to fix the InterruptThrottleRate in modprobe.conf, I'm
used to limit it between 5000 and 20000, and then it's easy to get very
nice numbers. Above that, using 10G NICs from Myricom will provide even
better enhancements, but you need the connectivity too. To give you an
idea, on a machine where I got about 35K sess/s (forwarded to server)
with a gig intel NIC, switching to the myri improves to about 45K without
changing anything else.

> My final task is to handle DDoS attacks with flexible and robust
> filter available. Haproxy is already helping me to stay alive under
> ~8-10k DDoS bots (I'm using two servers and DNS RR in production), but
> attackers are not sleeping and I'm expecting attacks to continue with
> more bots. I bet they will stop at 20-25k bots.

You can never guess at what level they will stop. It's just a matter of
money for them. If they're doing that just to annoy you, then probably
yes at one point they'll think that getting a laugh is costing too much.
But if they have any business in damaging your site, there is no reason
for them to stop when the costs increase, until the cost is higher than
the income they expect from taking your site down.

Also you should never publicly tell them how you're trying to get rid
of them nor what your sizing is, as you did right here :-)

> Such botnet will
> generate approx. 500k session rate. and ~1Gbps bandwidth so I was
> dreaming to handle it on this one server with two NIC's bonded giving
> me 2Gbps for traffic:)

Two gig NIC will not handle 500k sess/s. Wire limit is 1.4Mpps for
short packets. NIC limits I often encounter is 550 kpps on PCI-X and
630 kpps on PCI-expres. With two NICs, you'll be able to process 1.2 Mpps.
This is only enough to deal with 300k session rate if they're sending
requests :

   - SYN (1)
   - SYN-ACK (return)
   - ACK (1)
   - request (1)
   - response + FIN (return)
   - FIN (1)
   - ACK (return)

=> 4 packets incoming per session, so 1.2M/4 = 300k.

You can improve that by immediately resetting the connection when you
know their IP (use 1.5-dev for that, it has tables with per-IP rates) :

   - SYN (1)
   - SYN-ACK (return)
   - ACK (1)
   - request (1)
   - RST (return)

=> 3 packets or 400k sess/s.

In fact, the reset will happen just after the first ACK but since the
request is sent at the same time by the client, you'll get the packet
anyway and a second reset will be sent.

Also, speaking of 1.5-dev, it is with it that I reached 300ksess/s on
the core i5. The processing has been layered a bit more, and it is
possible that blocking at a lower layer is much faster than in 1.4.

Regards,
Willy


Reply via email to