On 30/06/15 11:47, Willy Tarreau wrote:
What causes a problem to me is that haproxy doesn't decide how/where
the packets are segmented, the TCP stack does it. This does not mean
haproxy is not faulty, but that it had 1288 valid positions to cut
these data and that by pure luck it cut them on the only position
which is exactly between two packets, which looks suspicious. I
remember we faced a very similar problem a very long time ago with TCP
splicing due to a kernel bug, and it was triggered by retransmits
(which also happen in your case). However I'm seeing "nosplice" in
your config, so that puzzles me even more. Would you have the ability
to test it over IPv4 only on the client side, so that we see if the
issue happens when mixing v4 with v6 ? If you could run a few tests in
TCP mode instead of HTTP, it would provide a significant help as well.
Another interesting test to run would be to have haproxy running
through strace during a faulty test. You'll need to do it like this.
The log will be large: strace -o log -tts 16384 -p $(haproxy_pid) The
trace will be huge but it will tell us what haproxy sends down to the
stack. And if we see the corruption there, we'll see if it happens in
the middle of a send() or at the beginning immediately after another
one, indicating a wrapping buffer. Just as a quick verification, since
I saw your kernel is dated December, I checked the fixes merged since
then in the 3.13.y-ckt branch and found nothing apparently relevant.
Thanks! Willy
Hi Willy,
Thank you for the detailed reply. I can confirm this occurs when using
pure IPv4. I will set up a new instance with IPv4 only, and run run it
with strace. I'll get back to you as soon as I have the data.
A couple of other things worth noting:
1) My hardware firewall in front of HAProcy forces MSS to 1300, hence
the unusually small packets on the frontend. I find this avoids all
problems caused by broken path MTU detection. However, the problem still
occurs when this setting is removed.
2) I cannot reproduce it, but some of my customers have reported corrupt
packets when connecting to SSH (via HAProxy in TCP mode). This only
occurs on long running SSH connections. SSH reports a corrupt packet and
terminates the connection.
Charlie