Hi Christian,
On Wed, Jan 09, 2013 at 03:27:11PM +0000, Christian Becker wrote:
> On 09.01.2013, at 14:55, Lukas Tribus <[email protected]> wrote:
>
> >
> >> In the mean time i´ve downgraded to the old kernel, but the performances
> >> issues persist. So this seems to be a issue in haproxy.
> >
> > This is very strange. In your first mail you reported that your CPU is
> > spending 30% in userspace and 70% is system. How is your CPU usage now?
>
> Here [1] is a screenshot from our monitoring with some annotations
First, you're having a higher CPU usage with splicing than without.
It's something common to many gig-speed NICs which seem to collect
too few packets at a time. Splicing is really interesting with 10GE
NICs. For example, I never managed to achieve at least similar speed
with splice as with recv/send using an e1000e NIC. Also, you absolutely
need both GRO/LRO and TSO on your input and output NICs respectively
so that splice() is efficient. Anyway you can try to improve things
by changing the default pipe size in the global section :
tune.pipesize 262144 # default is 65536
It will allow more data to flow between both sides and will significantly
reduce the number of syscalls. You need to have both your tcp_rmem and
tcp_wmem large enough to ensure that most small objects larger than the
internal buffer size are transferred at once. But not too large because
socket memory will quickly eat all your system's memory.
> > You are running the latest snapshot, could you downgrade to dev17? There are
> > some epoll/splice commits between dev17 and the snapshot you are running,
> > and
> > the snapshot has probably not seen very much testing yet.
>
> did that already and found no differences at all in the cpu load
OK thanks for checking. I would have been surprized if they did cause
this because they concern only old bogus kernels or shutdowns.
Another idea that comes to mind, did you try to pin your network IRQs
and haproxy to particular cores or do you let the system place them ?
Depending on even very tiny changes, the scheduler might decide to
place them differently because of different patterns, resulting in
more or less efficient usage depending on the case. If you're running
at 100% CPU, either something is spinning like mad or you're having a
high enough data rate, and in this case CPU affinity matters a lot.
If increasing the default pipe size does not help, you can run a quick
check of how splice() runs :
strace -tt -e trace=accept,connect,close,epoll_wait,epoll_ctl,splice \
-o splice-activity.log -p $(pidof haproxy)
It is possible that the changes in polling have slightly modified the
syscalls patterns resulting in very different results (I'm not observing
this here but you know what it's like...).
My fingers are itching a lot think about changing the lower layers
semantics to provide an explicit forward facility in haproxy, that
could probably come with less scheduling overhead and pave the way
to higher speeds later with assistance from the kernel (with yet to
be developped syscalls). I always try to refrain from doing that now
because we've known a long enough trouble between dev12 and dev17...
> >> Currently only the initial Warnings i´ve posted are
> >> related to the new kernel
> >
> > So with the old kernel you don't see this warning if I
> > understand you correctly?
>
> yes
3.7 is still a bit "wet" in my opinion, we spent the whole week-end
with Eric chasing new splice regressions and trying to fix them !
They're included in the latest 3.7.2 review in case you're interested.
> >> Anyway i´ll post the warnings to the netdev list,
> >> maybe they can fix them.
> >
> > Yes, please do that, so the kernel issue can be fixed as well.
>
> already done and i´ve seen you already noticed ;)
> For the Record: it´s here [2]
BTW I have not seen your response to Eric's proposal, if you tested
and did not find any improvement, please respond to him, it's very
important for developers to know what effects their patches have on
bugs (whether good, bad, or none).
Regards,
Willy