Hi Willy,
Thanks for getting back to me, and thank you for HAProxy - it is one of
my favourite pieces of software. I've done a bit more digging, and have
figured out how to reproduce the behaviour.
On 2017-01-16 23:54, Willy Tarreau wrote:
I notice here that the connection takes 35 minutes to time out once entering
FIN_WAIT_2, which is the value I'm setting for 'timeout tunnel'.
This is very strange because the kernel normally has a much shorter
FIN timeout (tcp_fin_timeout=60) so maybe your sysctl is this high ?
tcp_fin_timeout is set to 60 seconds on my system, but as I understand
it, this parameter only applies to orphaned connections. In this case,
HAProxy appears to be holding the socket open, so still has
responsibility for cleaning up the connection. Am I correct in thinking
you use shutdown to close the write side of the socket, leaving the read
side open in case the client still has data to send?
I've written a simple Python client that doesn't close a connection in
response to the FIN, and with this I can occasionally reproduce the
behaviour. The sequence of events is as follows, with the state of the
client to HAProxy connection in parentheses:
1. Client opens connection to server via HAProxy. (ESTABLISHED)
2. Server closes connection, causing HAProxy to send a FIN to the
client. (FIN_WAIT1)
3. Client ACKs the FIN, but does not send a FIN of its own. (FIN_WAIT2)
4. HAProxy timeout-tunnel period elapses (FIN_WAIT2 (orphaned))
5. tcp_fin_timeout period elapses. (Socket state is removed by the kernel)
I do wonder if there's a race somewhere though, as sometimes at step 3
the client-fin timeout (30s in my case) seems to kick in, and the
connection state is cleaned up quickly.
In your case I'd have a look at tcp_fin_timeout to possibly lower it, but
that's all. I wouldn't be worried by this number of FIN_WAIT2 connections
though I understand that at least the cause needs to be figured out and
possibly addressed.
It's not a huge problem as the FIN_WAIT2 connections aren't using much
system resource. It's more of an annoyance really, as they're throwing
out my HAProxy stats. I.e. nearly 50% of the current sessions do not
correspond to an active connection through the proxy.
I'm going to do some more testing to see if I can figure out why it's
not reliably reproducible, and perhaps try a 1.7 build to see if I get
different results there.
Thanks again,
Richard
_____________________________________________________________________________
This email has been filtered by SMX. For more info visit http://smxemail.com
_____________________________________________________________________________