Hi Willy, On 13 Dec 2013, at 02:13, Willy Tarreau <[email protected]> wrote:
> On Mon, Dec 09, 2013 at 03:43:09PM +0000, Annika Wickert wrote: >> - Two Intel(R) Xeon(R) CPU X6550 @ 2.00GHz in each cluster node >> - 2x Emulex Corporation OneConnect 10Gb NIC (rev 02) in each cluster node >> - 32gbit RAM in each cluster node >> - Two nodes per cluster (active-active in the new one) > > I never had the opportunity to test Emulex NICs yet. It could be possible > that they disable some TCP optimizations by default resulting in worse > performance with splice(). I just read the documentation of Emulex and it says TSO and LRO and so on is enabled by default. http://www-dl.emulex.com/support/linux/83525/linux_11sp.pdf > >> - Debian Squeeze / 3.1.0-1-amd64 / Tickrate 250 >> - CentOS release 6.4 (Final) / 3.11.5-1.el6 / Tickrate 1000 >> >> The higher the tickrate, the higher the CPU load. You quadripled >> the tickrate, and your load what - quadripled? I suggest you >> try a lower tickrate in the very same configuration. > > 250 is the best tick rate for network related traffic, it allows a > number of timing conversions to milliseconds to be done with a simple > shift instead of a divide, while not hammering the system too fast. > >> - We are forcing by splice-request / splice-responce > > OK so I suspect this is purely TCP. No it’s mostly HTTP and HTTPS but we had enabled splice-request / splice-responce also in the previous Haproxy version and it worked without an impact. > >> I believe splice is not always more efficient than recv/send; > > Confirmed, especially with small transfers (less than a page = 4 kB). Ok, we have many small transfers. > >> use splice-auto to use it less aggressively (doc: splice-auto): >> >> For testing we disabled splicing on one of the cluster members on the new >> cluster (after succesfull tests). Now load drops below 8 from 16. So I maybe >> try it with splice-auto and if that does not help with a new haproxy build >> with the following git commits: >> http://haproxy.1wt.eu/git?p=haproxy.git;a=commit;h=61d39a0e2a047df78f7f3bfcf5584090913cdc65 > > Oh good point, I completely forgot about this one. Yes it could be a culprit! I tried it in testing environment and it looks like this makes the difference. > >> http://haproxy.1wt.eu/git?p=haproxy.git;a=commit;h=fa8e2bc68c583a227ebc78bab5779b84065b28da >> >> Haproxy uses heuristics to estimate if kernel splicing might improve >> performance or not. Both directions are handled independently. Note >> that the heuristics used are not much aggressive in order to limit >> excessive use of splicing. > > Yes, the heuristics consist in detecting if haproxy manages to read a full > buffer a once and to purge it at once. If that works, then it's considered > that the traffic is high enough for making a good use of splice(). Otherwise > with non-complete buffers, it sticks to recv/send. It tends to work really > well in web environments when you don't want favicon.ico to be spliced but > you want your photos to be. Ok, so I will try this also in testing environment. > > Regards, > Willy To say something positiv SSL offloading works like a charm :). Thank you for your explanations in the other mail :). Regards, Annika

