Hi Willy! I don't answer at everything but I have read carefully what you said. Thanks for such precise tips.
OoO En cette nuit nuageuse du mardi 10 mai 2011, vers 00:31, Willy Tarreau <w...@1wt.eu> disait : >> I will also test with a 10G NIC next week. However, with such small >> objects, I suppose that segment coalescing will not be enabled. > Indeed, however I've seen far better results even on small packets with 10G > NICs than with 1G ones. The 42khits/s are on myri10ge NICs (generation 1), > while with 1G NICs, it's much lower (I have memories of 25khits, though I > may be wrong, since I don't use 1G anymore for high perf testing). > It happens that the CPUs on 10G NICs are much smarter and much more > powerful and seem to provide really nice enhancements without overloading > the main CPU. On my tests with the Myri NICs, the IRQ rate always remains > quite low, and the system feels as if nothing was happening, always very > responsive. > Depending on the 10G NIC you'll use for your tests, there are some important > things to consider. For instance, my experiments with intel's 82599 left me > unconvinced. Depending on the workload, your traffic may suddenly vary from > 2G to 10G without any explanation. They can easily reach 10G full duplex with > very low CPU overhead even without splicing once your traffic does not change > at all and you found the exact settings. If either of the settings or traffic > changes, you can go back to 2G with 100% CPU. The NIC seems to have a lot of > potential but looks very hard to tune. I have tried with 82599 and got no performance improvements at all (they were plugged on 1G ports). I have also noticed that HP DL380 G5/6/7 do not support IOA/T and DCA that should boost performances. I have asked HP about this. I'll try to get an hand on the Myris. > It's not incredible, it depends on what you're doing with your products. > For instance, L4 in F5 is done in lower layers and is able to catch with > the 120kreq/s without too many difficulties. They have no trouble getting > up to 35kreq/s at L7 either. But it's possible that once you enable cookie > based persistence the rate substantially drops. Same if you need to rewrite > request/response headers, or to add larges ACLs. I remember recently seeing > someone with a 6800 that was saturated at only a few thousands req/s with > a few thousands rules. What a waste of power! They basically degraded their > product to saturate it at 1% of its performance because of a misuse. Such > products are done to do a small number of things very fast. Most often they > will be faster than a pure software-based solution, but sometimes they lose > because some apparently complex processing is not optimised as much there > as it is in general purpose software. For instance, write an ACL with one > million IP addresses on your haproxy and do the same on your 3900. I'm sure > you won't notice the difference in haproxy tests, but I'm not sure at all > the 3900 will like it. In fact, we have two problems with our benchs: - We do not reproduce real life because we did not measured what a real life scenario is. For example, I should try to measure the average HTTP request size that we got on our platform. 1000 seems pretty small. - We used only L4 load balancing for years and therefore we bench L7 load balancing the exact same way. For example, when we need persistence, we still enable persistence based on source IP address. -- panic("Foooooooood fight!"); 2.2.16 /usr/src/linux/drivers/scsi/aha1542.c