Yo kernel 2.6.22 + julian's nfct patch. /proc/net/ipv4/vs/ snat_reroute=1 conntrack=1
I have a server behind LVS-NAT that sends all it's data quite fast followed by a FIN. after that, it retransmits lost packets as needed. the problem is, that for some reason, the connection-terminating FIN (with the last ACK) from CLIENT isn't delivered to the RS (in some cases), which keeps on sending the last packet until it gives up. The following rules in FORWARD chain: ACCEPT 0 -- 0.0.0.0/0 0.0.0.0/0 ctstate ESTABLISHED DROP 0 -- 0.0.0.0/0 0.0.0.0/0 ctstate INVALID LOG 0 -- 0.0.0.0/0 0.0.0.0/0 LOG flags 0 level 4 prefix `forward: ' Netfilter seems to be matching a lot of ESTABLISHED and some INVALID packets. All those retransmissions from RS to CLIENT end up in the LOG rule and get dropped, so for them no ctstate was found? Packet traces (from external and internal interfaces: 1.2.3.4 VIP, 10.0.0.1 RIP, 4.3.2.1 CIP): external: 13:34:04 4.3.2.1.9876 > 1.2.3.4.8888: S 3015053360:3015053360(0) 13:34:04 1.2.3.4.8888 > 4.3.2.1.9876: S 3950144430:3950144430(0) ack 3015053361 13:34:05 4.3.2.1.9876 > 1.2.3.4.8888: . ack 1 13:34:05 4.3.2.1.9876 > 1.2.3.4.8888: P 1:6(5) ack 1 13:34:05 1.2.3.4.8888 > 4.3.2.1.9876: . ack 6 13:34:05 1.2.3.4.8888 > 4.3.2.1.9876: P 1:6(5) ack 6 13:34:05 4.3.2.1.9876 > 1.2.3.4.8888: . ack 6 13:34:05 4.3.2.1.9876 > 1.2.3.4.8888: P 6:216(210) ack 6 13:34:05 1.2.3.4.8888 > 4.3.2.1.9876: . ack 216 13:34:06 4.3.2.1.9876 > 1.2.3.4.8888: P 216:323(107) ack 6 13:34:06 1.2.3.4.8888 > 4.3.2.1.9876: . ack 323 13:34:06 1.2.3.4.8888 > 4.3.2.1.9876: P 6:22(16) ack 323 13:34:06 1.2.3.4.8888 > 4.3.2.1.9876: . 22:1462(1440) ack 323 13:34:07 4.3.2.1.9876 > 1.2.3.4.8888: . ack 22 13:34:07 1.2.3.4.8888 > 4.3.2.1.9876: . 1462:2902(1440) ack 323 13:34:09 1.2.3.4.8888 > 4.3.2.1.9876: . 22:1462(1440) ack 323 13:34:10 4.3.2.1.9876 > 1.2.3.4.8888: . ack 1462 13:34:10 1.2.3.4.8888 > 4.3.2.1.9876: . 1462:2902(1440) ack 323 13:34:11 4.3.2.1.9876 > 1.2.3.4.8888: . ack 2902 13:34:11 1.2.3.4.8888 > 4.3.2.1.9876: . 2902:4342(1440) ack 323 13:34:15 1.2.3.4.8888 > 4.3.2.1.9876: . 2902:4342(1440) ack 323 ...skip some... 13:34:21 1.2.3.4.8888 > 4.3.2.1.9876: . 60502:61942(1440) ack 323 13:34:21 1.2.3.4.8888 > 4.3.2.1.9876: . 61942:63382(1440) ack 323 13:34:21 1.2.3.4.8888 > 4.3.2.1.9876: FP 63382:64463(1081) ack 323 13:34:22 4.3.2.1.9876 > 1.2.3.4.8888: . ack 2902 13:34:22 4.3.2.1.9876 > 1.2.3.4.8888: . ack 2902 13:34:25 1.2.3.4.8888 > 4.3.2.1.9876: . 2902:4342(1440) ack 323 13:34:25 4.3.2.1.9876 > 1.2.3.4.8888: . ack 7222 13:34:25 1.2.3.4.8888 > 4.3.2.1.9876: . 7222:8662(1440) ack 323 13:34:43 1.2.3.4.8888 > 4.3.2.1.9876: . 7222:8662(1440) ack 323 13:34:44 4.3.2.1.9876 > 1.2.3.4.8888: . ack 8662 13:34:44 1.2.3.4.8888 > 4.3.2.1.9876: . 8662:10102(1440) ack 323 13:35:21 1.2.3.4.8888 > 4.3.2.1.9876: . 8662:10102(1440) ack 323 13:35:22 4.3.2.1.9876 > 1.2.3.4.8888: . ack 10102 13:35:22 1.2.3.4.8888 > 4.3.2.1.9876: . 10102:11542(1440) ack 323 13:35:22 4.3.2.1.9876 > 1.2.3.4.8888: . ack 11542 13:35:22 1.2.3.4.8888 > 4.3.2.1.9876: . 11542:12982(1440) ack 323 13:35:23 4.3.2.1.9876 > 1.2.3.4.8888: . ack 12982 13:35:23 1.2.3.4.8888 > 4.3.2.1.9876: . 12982:14422(1440) ack 323 13:35:24 4.3.2.1.9876 > 1.2.3.4.8888: . ack 14422 13:35:24 1.2.3.4.8888 > 4.3.2.1.9876: . 14422:15862(1440) ack 323 13:35:25 4.3.2.1.9876 > 1.2.3.4.8888: . ack 15862 13:35:25 1.2.3.4.8888 > 4.3.2.1.9876: . 15862:17302(1440) ack 323 13:35:25 4.3.2.1.9876 > 1.2.3.4.8888: . ack 17302 13:35:25 1.2.3.4.8888 > 4.3.2.1.9876: . 17302:18742(1440) ack 323 13:37:25 4.3.2.1.9876 > 1.2.3.4.8888: F 323:323(0) ack 17302 13:37:28 4.3.2.1.9876 > 1.2.3.4.8888: F 323:323(0) ack 17302 13:37:33 4.3.2.1.9876 > 1.2.3.4.8888: F 323:323(0) ack 17302 13:37:45 4.3.2.1.9876 > 1.2.3.4.8888: F 323:323(0) ack 17302 13:38:09 4.3.2.1.9876 > 1.2.3.4.8888: F 323:323(0) ack 17302 internal: 13:34:05 4.3.2.1.9876 > 10.0.0.1.8888: . ack 1 13:34:05 4.3.2.1.9876 > 10.0.0.1.8888: P 1:6(5) ack 1 13:34:05 10.0.0.1.8888 > 4.3.2.1.9876: . ack 6 13:34:05 10.0.0.1.8888 > 4.3.2.1.9876: P 1:6(5) ack 6 13:34:05 4.3.2.1.9876 > 10.0.0.1.8888: . ack 6 13:34:05 4.3.2.1.9876 > 10.0.0.1.8888: P 6:216(210) ack 6 13:34:05 10.0.0.1.8888 > 4.3.2.1.9876: . ack 216 13:34:06 4.3.2.1.9876 > 10.0.0.1.8888: P 216:323(107) ack 6 13:34:06 10.0.0.1.8888 > 4.3.2.1.9876: . ack 323 13:34:06 10.0.0.1.8888 > 4.3.2.1.9876: P 6:22(16) ack 323 13:34:06 10.0.0.1.8888 > 4.3.2.1.9876: . 22:1462(1440) ack 323 13:34:07 4.3.2.1.9876 > 10.0.0.1.8888: . ack 22 13:34:07 10.0.0.1.8888 > 4.3.2.1.9876: . 1462:2902(1440) ack 323 13:34:09 10.0.0.1.8888 > 4.3.2.1.9876: . 22:1462(1440) ack 323 13:34:10 4.3.2.1.9876 > 10.0.0.1.8888: . ack 1462 13:34:10 10.0.0.1.8888 > 4.3.2.1.9876: . 1462:2902(1440) ack 323 13:34:11 4.3.2.1.9876 > 10.0.0.1.8888: . ack 2902 13:34:11 10.0.0.1.8888 > 4.3.2.1.9876: . 2902:4342(1440) ack 323 13:34:15 10.0.0.1.8888 > 4.3.2.1.9876: . 2902:4342(1440) ack 323 ...skip some... 13:34:21 10.0.0.1.8888 > 4.3.2.1.9876: . 60502:61942(1440) ack 323 13:34:21 10.0.0.1.8888 > 4.3.2.1.9876: . 61942:63382(1440) ack 323 13:34:21 10.0.0.1.8888 > 4.3.2.1.9876: FP 63382:64463(1081) ack 323 13:34:22 4.3.2.1.9876 > 10.0.0.1.8888: . ack 2902 13:34:22 4.3.2.1.9876 > 10.0.0.1.8888: . ack 2902 13:34:25 10.0.0.1.8888 > 4.3.2.1.9876: . 2902:4342(1440) ack 323 13:34:25 4.3.2.1.9876 > 10.0.0.1.8888: . ack 7222 13:34:25 10.0.0.1.8888 > 4.3.2.1.9876: . 7222:8662(1440) ack 323 13:34:43 10.0.0.1.8888 > 4.3.2.1.9876: . 7222:8662(1440) ack 323 13:34:44 4.3.2.1.9876 > 10.0.0.1.8888: . ack 8662 13:34:44 10.0.0.1.8888 > 4.3.2.1.9876: . 8662:10102(1440) ack 323 13:35:21 10.0.0.1.8888 > 4.3.2.1.9876: . 8662:10102(1440) ack 323 13:35:22 4.3.2.1.9876 > 10.0.0.1.8888: . ack 10102 13:35:22 10.0.0.1.8888 > 4.3.2.1.9876: . 10102:11542(1440) ack 323 13:35:22 4.3.2.1.9876 > 10.0.0.1.8888: . ack 11542 13:35:22 10.0.0.1.8888 > 4.3.2.1.9876: . 11542:12982(1440) ack 323 13:35:23 4.3.2.1.9876 > 10.0.0.1.8888: . ack 12982 13:35:23 10.0.0.1.8888 > 4.3.2.1.9876: . 12982:14422(1440) ack 323 13:35:24 4.3.2.1.9876 > 10.0.0.1.8888: . ack 14422 13:35:24 10.0.0.1.8888 > 4.3.2.1.9876: . 14422:15862(1440) ack 323 13:35:25 4.3.2.1.9876 > 10.0.0.1.8888: . ack 15862 13:35:25 10.0.0.1.8888 > 4.3.2.1.9876: . 15862:17302(1440) ack 323 13:35:25 4.3.2.1.9876 > 10.0.0.1.8888: . ack 17302 13:35:25 10.0.0.1.8888 > 4.3.2.1.9876: . 17302:18742(1440) ack 323 13:36:38 10.0.0.1.8888 > 4.3.2.1.9876: . 17302:18742(1440) ack 323 As seen, the RS keeps on trying to send the last packet while CLIENT keeps on trying to send the FIN. I'm not entirely sure if I was able to read the said information fast enough (lots of connections, big tables) but it seems that at that time ipvsadm -L --connection shows that connection in "FIN_WAIT" while /proc/net/ip_conntrack does not have an entry for it at all. There is also a variation of this issue, where the final FIN is delivered from CLIENT to RS, but the RS's ACK isn't delivered to the CLIENT, so the client still keeps on sending FINs. In that case, ipvsadm shows the connection in "TIME_WAIT" state (still nothing in conntrack). Alltogether, a few percent of connections is affected by this. My interpetation is, that for some reason LVS code seems to remove the conntrack immediately when a final FIN is seen and stops forwarding packets after that. My iptables rules stop the answers going out, because the connection is no longer ESTABLISHED. Siim _______________________________________________ LinuxVirtualServer.org mailing list - [email protected] Send requests to [EMAIL PROTECTED] or go to http://lists.graemef.net/mailman/listinfo/lvs-users
