Hi Willy,
I've tracked one of connections hanging in CLOSE_WAIT state with tcpdump
over last night. It started at 17:19 like this:
"Packet No.","Time in
seconds","Source","Destination","Protocol","Length","Info"
"1","0.000000","ip_client","ip_haproxy_server","TCP","62","64311 >
443 [SYN] Seq=0 Win=8192 Len=0 MSS=1460 SACK_PERM=1"
"2","0.001049","ip_haproxy_server","ip_client","TCP","62","443 >
64311 [SYN, ACK] Seq=0 Ack=1 Win=29200 Len=0 MSS=1460 SACK_PERM=1"
"3","0.127239","ip_client","ip_haproxy_server","TCP","54","64311 >
443 [ACK] Seq=1 Ack=1 Win=64240 Len=0"
"4","0.127344","ip_client","ip_haproxy_server","TLSv1.2","571","Client
Hello"
"5","0.130304","ip_haproxy_server","ip_client","TLSv1.2","2974","Server
Hello, Certificate"
"6","0.130336","ip_haproxy_server","ip_client","TLSv1.2","310","Server
Key Exchange, Server Hello Done"
After some 13 seconds client sent it's last data, which haproxy server
acknowledged.
"319","13.781347","ip_client","ip_haproxy_server","TLSv1.2","96","Application
Data"
"320","13.781365","ip_haproxy_server","ip_client","TCP","54","443 >
64311 [ACK] Seq=240156 Ack=3689 Win=36448 Len=0"
Then client sent FIN packet, server acknowledged it again
"321","16.292016","ip_client","ip_haproxy_server","TCP","54","64311 >
443 [FIN, ACK] Seq=3689 Ack=240156 Win=64240 Len=0"
"322","16.329574","ip_haproxy_server","ip_client","TCP","54","443 >
64311 [ACK] Seq=240156 Ack=3690 Win=36448 Len=0"
>From then client sent only TCP keepalive every 45s, which server always
ackonwledged.
"323","61.443121","ip_client","ip_haproxy_server","TCP","55","[TCP
Keep-Alive] 64311 > 443 [ACK] Seq=3689 Ack=240156 Win=64240 Len=1"
"324","61.443216","ip_haproxy_server","ip_client","TCP","66","[TCP
Keep-Alive ACK] 443 > 64311 [ACK] Seq=240156 Ack=3690 Win=36448 Len=0
SLE=3689 SRE=3690"
"325","106.528926","ip_client","ip_haproxy_server","TCP","55","[TCP
Keep-Alive] 64311 > 443 [ACK] Seq=3689 Ack=240156 Win=64240 Len=1"
"326","106.529117","ip_haproxy_server","ip_client","TCP","66","[TCP
Keep-Alive ACK] 443 > 64311 [ACK] Seq=240156 Ack=3690 Win=36448 Len=0
SLE=3689 SRE=3690"
...
After some 4.5 hours (at 21:51) client sent last keepalive which server
acknowledged. There were no more packets after that.
"1043","16284.644240","ip_client","ip_haproxy_server","TCP","55","[TCP
Keep-Alive] 64311 > 443 [ACK] Seq=3689 Ack=240156 Win=64240 Len=1"
"1044","16284.644354","ip_haproxy_server","ip_client","TCP","66","[TCP
Keep-Alive ACK] 443 > 64311 [ACK] Seq=240156 Ack=3690 Win=36448 Len=0
SLE=3689 SRE=3690"
"1045","16329.797223","ip_client","ip_haproxy_server","TCP","55","[TCP
Keep-Alive] 64311 > 443 [ACK] Seq=3689 Ack=240156 Win=64240 Len=1"
"1046","16329.797274","ip_haproxy_server","ip_client","TCP","66","[TCP
Keep-Alive ACK] 443 > 64311 [ACK] Seq=240156 Ack=3690 Win=36448 Len=0
SLE=3689 SRE=3690"
Next day in the morning at 10:40 I can still see the hanging connection on
the server:
netstat -aptn|grep 64311
tcp 430 0 ip_haproxy_server:443 ip_client:64311
CLOSE_WAIT 916/haproxy
lsof|grep 64311
haproxy 916 haproxy 40u IPv4 106204553
0t0 TCP ip_haproxy_server:https->ip_client:64311 (CLOSE_WAIT)
echo "show fd" | socat - $HASOCK | grep "40 :"
40 : st=0x20(R:pra W:pRa) ev=0x00(heopi) [nlc] cache=0 owner=0x1648d80
iocb=0x4d2c80(conn_fd_handler) tmask=0x1 umask=0x0 cflg=0x80203300
fe=fe-http mux=H2 mux_ctx=0x15e9460
I hope this can help in tracking the problem down.
Best regards,
Milan