Hi Willy,

Thanks for getting back to me, and thank you for HAProxy - it is one of my favourite pieces of software. I've done a bit more digging, and have figured out how to reproduce the behaviour.

On 2017-01-16 23:54, Willy Tarreau wrote:
I notice here that the connection takes 35 minutes to time out once entering
FIN_WAIT_2, which is the value I'm setting for 'timeout tunnel'.
This is very strange because the kernel normally has a much shorter
FIN timeout (tcp_fin_timeout=60) so maybe your sysctl is this high ?
tcp_fin_timeout is set to 60 seconds on my system, but as I understand it, this parameter only applies to orphaned connections. In this case, HAProxy appears to be holding the socket open, so still has responsibility for cleaning up the connection. Am I correct in thinking you use shutdown to close the write side of the socket, leaving the read side open in case the client still has data to send?

I've written a simple Python client that doesn't close a connection in response to the FIN, and with this I can occasionally reproduce the behaviour. The sequence of events is as follows, with the state of the client to HAProxy connection in parentheses:

1. Client opens connection to server via HAProxy. (ESTABLISHED)
2. Server closes connection, causing HAProxy to send a FIN to the client. (FIN_WAIT1)
3. Client ACKs the FIN, but does not send a FIN of its own. (FIN_WAIT2)
4. HAProxy timeout-tunnel period elapses (FIN_WAIT2 (orphaned))
5. tcp_fin_timeout period elapses. (Socket state is removed by the kernel)

I do wonder if there's a race somewhere though, as sometimes at step 3 the client-fin timeout (30s in my case) seems to kick in, and the connection state is cleaned up quickly.
In your case I'd have a look at tcp_fin_timeout to possibly lower it, but
that's all. I wouldn't be worried by this number of FIN_WAIT2 connections
though I understand that at least the cause needs to be figured out and
possibly addressed.
It's not a huge problem as the FIN_WAIT2 connections aren't using much system resource. It's more of an annoyance really, as they're throwing out my HAProxy stats. I.e. nearly 50% of the current sessions do not correspond to an active connection through the proxy.

I'm going to do some more testing to see if I can figure out why it's not reliably reproducible, and perhaps try a 1.7 build to see if I get different results there.

Thanks again,
Richard
_____________________________________________________________________________

This email has been filtered by SMX. For more info visit http://smxemail.com
_____________________________________________________________________________

Reply via email to