Hi Willy!

I  don't  answer  at everything  but  I  have  read carefully  what  you
said. Thanks for such precise tips.

OoO  En cette  nuit nuageuse  du mardi  10 mai  2011, vers  00:31, Willy
Tarreau <w...@1wt.eu> disait :

>> I will  also test  with a 10G  NIC next  week. However, with  such small
>> objects, I suppose that segment coalescing will not be enabled.

> Indeed, however I've seen far better results even on small packets with 10G
> NICs than with 1G ones. The 42khits/s are on myri10ge NICs (generation 1),
> while with 1G NICs, it's much lower (I have memories of 25khits, though I
> may be wrong, since I don't use 1G anymore for high perf testing).

> It happens that the CPUs on 10G NICs are much smarter and much more
> powerful and seem to provide really nice enhancements without overloading
> the main CPU. On my tests with the Myri NICs, the IRQ rate always remains
> quite low, and the system feels as if nothing was happening, always very
> responsive.

> Depending on the 10G NIC you'll use for your tests, there are some important
> things to consider. For instance, my experiments with intel's 82599 left me
> unconvinced. Depending on the workload, your traffic may suddenly vary from
> 2G to 10G without any explanation. They can easily reach 10G full duplex with
> very low CPU overhead even without splicing once your traffic does not change
> at all and you found the exact settings. If either of the settings or traffic
> changes, you can go back to 2G with 100% CPU. The NIC seems to have a lot of
> potential but looks very hard to tune.

I have tried with 82599 and got no performance improvements at all (they
were plugged on  1G ports). I have also noticed that  HP DL380 G5/6/7 do
not support IOA/T  and DCA that should boost  performances. I have asked
HP about this.  I'll try to get an hand on the Myris.

> It's not incredible, it depends on what you're doing with your products.
> For instance, L4 in F5 is done in lower layers and is able to catch with
> the 120kreq/s without too many difficulties. They have no trouble getting
> up to 35kreq/s at L7 either. But it's possible that once you enable cookie
> based persistence the rate substantially drops. Same if you need to rewrite
> request/response headers, or to add larges ACLs. I remember recently seeing
> someone with a 6800 that was saturated at only a few thousands req/s with
> a few thousands rules. What a waste of power! They basically degraded their
> product to saturate it at 1% of its performance because of a misuse. Such
> products are done to do a small number of things very fast. Most often they
> will be faster than a pure software-based solution, but sometimes they lose
> because some apparently complex processing is not optimised as much there
> as it is in general purpose software. For instance, write an ACL with one
> million IP addresses on your haproxy and do the same on your 3900. I'm sure
> you won't notice the difference in haproxy tests, but I'm not sure at all
> the 3900 will like it.

In fact, we have two problems with our benchs:
 - We do not reproduce real life because we did not measured what a real
   life scenario  is. For example, I  should try to  measure the average
   HTTP  request size that  we got  on our  platform. 1000  seems pretty
   small.
 - We  used only L4 load balancing  for years and therefore  we bench L7
   load  balancing  the  exact  same  way. For  example,  when  we  need
   persistence, we still enable persistence based on source IP address.
-- 
panic("Foooooooood fight!");
        2.2.16 /usr/src/linux/drivers/scsi/aha1542.c

Reply via email to