Re: splice(0xedb, 0, 0xf09, 0, 0x72b0, 0x3) = -1 EAGAIN (Resource temporarily unavailable)

Willy Tarreau Thu, 12 Dec 2013 23:57:09 -0800

Hi Annika,

On Fri, Dec 13, 2013 at 07:12:33AM +0000, Annika Wickert wrote:
> Hi Willy,
> 
> On 13 Dec 2013, at 02:13, Willy Tarreau <[email protected]> wrote:
> 
> > On Mon, Dec 09, 2013 at 03:43:09PM +0000, Annika Wickert wrote:
> >> - Two Intel(R) Xeon(R) CPU X6550 @ 2.00GHz in each cluster node
> >> - 2x Emulex Corporation OneConnect 10Gb NIC (rev 02) in each cluster node
> >> - 32gbit RAM in each cluster node
> >> - Two nodes per cluster (active-active in the new one)
> > 
> > I never had the opportunity to test Emulex NICs yet. It could be possible
> > that they disable some TCP optimizations by default resulting in worse
> > performance with splice().
> 
> I just read the documentation of Emulex and it says TSO and LRO and so on is 
> enabled by default.
> http://www-dl.emulex.com/support/linux/83525/linux_11sp.pdf


OK great then!

I hope you didn't disable SACK as they recommend in this doc, because
clearly the doc focuses on high *local* network performance!

> >> - We are forcing by splice-request / splice-responce
> > 
> > OK so I suspect this is purely TCP.
> No it?s mostly HTTP and HTTPS but we had enabled splice-request /
> splice-responce also in the previous Haproxy version and it worked without an
> impact.

OK but anyway in general splice-request provides no benefit in HTTP,
because :
  1) haproxy needs to read the request headers in a buffer before
     deciding to forward it, so splice() is not usable at the beginning ;

  2) requests containing a body (POST) generally come slowly and there is
     no opportunity for the NIC to merge large segments. Then, using
     splice() to move small packets results in the system having to do
     the memcpy() call after checking all the possible heuristics, while
     in the recv/send case it would immediately do the memcpy().

> >> I believe splice is not always more efficient than recv/send;
> > 
> > Confirmed, especially with small transfers (less than a page = 4 kB).
> Ok, we have many small transfers.

OK

> >> use splice-auto to use it less aggressively (doc: splice-auto):
> >> 
> >> For testing we disabled splicing on one of the cluster members on the new
> >> cluster (after succesfull tests). Now load drops below 8 from 16. So I 
> >> maybe
> >> try it with splice-auto and if that does not help with a new haproxy build
> >> with the following git commits:
> >> http://haproxy.1wt.eu/git?p=haproxy.git;a=commit;h=61d39a0e2a047df78f7f3bfcf5584090913cdc65
> > 
> > Oh good point, I completely forgot about this one. Yes it could be a 
> > culprit!
> I tried it in testing environment and it looks like this makes the difference.

Excellent! I'm not surprized that much given that your trace showed
epoll_wait() returning EPOLLIN|0x2000 (EPOLLRDHUP) which is one of
the cases which triggered this bug.

> >> http://haproxy.1wt.eu/git?p=haproxy.git;a=commit;h=fa8e2bc68c583a227ebc78bab5779b84065b28da
> >> 
> >> Haproxy uses heuristics to estimate if kernel splicing might improve
> >> performance or not. Both directions are handled independently. Note
> >> that the heuristics used are not much aggressive in order to limit
> >> excessive use of splicing.
> > 
> > Yes, the heuristics consist in detecting if haproxy manages to read a full
> > buffer a once and to purge it at once. If that works, then it's considered
> > that the traffic is high enough for making a good use of splice(). Otherwise
> > with non-complete buffers, it sticks to recv/send. It tends to work really
> > well in web environments when you don't want favicon.ico to be spliced but
> > you want your photos to be.
> Ok, so I will try this also in testing environment. 

You may want to try the latest snapshot as well in your dev environment,
which will become dev20 in a few days. There were some small performance
improvements and nice memory optimizations (about 640 bytes saved per
session). If you have to stick to a specific version because of the
qualification you've already done, then you have a significant amount
of fixes to pick there and backport :-)

At least you would probably be interested in this recent commit :

  
http://haproxy.1wt.eu/git?p=haproxy.git;a=commit;h=2f877304ef180654d165bf4ba8c88c204fc09d36

It can significantly reduce the number of recvfrom() calls returning
EAGAIN on small objects. In the optimal case, it can even get rid of
all of them and reduce by 33% the number of calls to recvfrom() and
epoll_ctl(), and 16.6% of epoll_wait().

> To say something positiv SSL offloading works like a charm :). 

I know this :-)

Our goal was to get rid of stunnel from our ALOHA load balancer, and not
only this goal was achieved, but we doubled the performance and gained a
lot in scalability with large numbers of concurrent connections. The next
step will be to try again with CyaSSL. The first tests we did last year
showed a massive performance gain over OpenSSL, but some missing features.
Now they seem to have everything so it's time to try again :-)

> Thank you for your explanations in the other mail :).

You're welcome!

Best regards,
Willy

Re: splice(0xedb, 0, 0xf09, 0, 0x72b0, 0x3) = -1 EAGAIN (Resource temporarily unavailable)

Reply via email to