Hi all, I encountered an issue with IPVS load balancing in case of short-lived connections. I've seen it in masqurading mode on CentOS 7 (kernel 3.10.0-693.11.6) and CoreOS 1235.12.0 (4.7.3-coreos-r3). After opening and closing of thousands of TCP connctions, new connections are being delayed for 1 second.
Please see a short example [4], there are steps to reproduce the issue. It starts nginx on port 8080 and creates a virtual service 127.0.0.1:80 -> 127.0.0.1:8080. After that an HTTP load generator (rakyll/hey) sends 30k queries with disabled keep-alive option. All records in ip_vs_conn table are in TIME_WAIT state. Then ipvs debug level is enabled and strace starts curl(1) with the same virtual service. Curl encounters the 1 second delay as shown. Attached you can find the full versions of strace.log and dmesg.log as well as their short versions [1] [2]. Setting conn_reuse_mode to 0 resolves the issue, but doesn't fit our needs and doesn't work well in case of changing the list of real servers. What could be causing the delay? How can we get rid of it? Thank you. [1] strace.log 1516823661.570756 connect(3, {sa_family=AF_INET, sin_port=htons(80), sin_addr=inet_addr("127.0.0.1")}, 16) = -1 EINPROGRESS (Operation now in progress) 1516823661.571578 poll([{fd=3, events=POLLOUT|POLLWRNORM}], 1, 0) = 0 (Timeout) 1516823661.572153 poll([{fd=3, events=POLLOUT}], 1, 1000) = 1 ([{fd=3, revents=POLLOUT}]) 1516823662.573304 poll([{fd=3, events=POLLOUT|POLLWRNORM}], 1, 0) = 1 ([{fd=3, revents=POLLOUT|POLLWRNORM}]) 1516823662.573556 getsockopt(3, SOL_SOCKET, SO_ERROR, [0], [4]) = 0 1516823662.573947 getpeername(3, {sa_family=AF_INET, sin_port=htons(80), sin_addr=inet_addr("127.0.0.1")}, [16]) = 0 1516823662.574139 getsockname(3, {sa_family=AF_INET, sin_port=htons(34258), sin_addr=inet_addr("127.0.0.1")}, [16]) = 0 1516823662.574641 sendto(3, "GET / HTTP/1.1\r\nUser-Agent: curl"..., 73, MSG_NOSIGNAL, NULL, 0) = 73 [2] dmesg.log 706.699647 IPVS: Enter: ip_vs_out, net/netfilter/ipvs/ip_vs_core.c line 1132 706.699656 IPVS: lookup/out TCP 127.0.0.1:34258->127.0.0.1:80 not hit 706.699662 IPVS: lookup/in TCP 127.0.0.1:34258->127.0.0.1:80 hit 706.700092 IPVS: ip_vs_conn_drop_conntrack: dropping conntrack with tuple=127.0.0.1:34258->127.0.0.1:80/6 for conn 127.0.0.1:34258->127.0.0.1:80->127.0.0.1:8080/6:5 706.700097 IPVS: ip_vs_conn_drop_conntrack: no conntrack for tuple=127.0.0.1:34258->127.0.0.1:80/6 706.700102 IPVS: ip_vs_conn_drop_conntrack: dropping conntrack with tuple=127.0.0.1:34258->127.0.0.1:80/6 for conn 127.0.0.1:34258->127.0.0.1:80->127.0.0.1:8080/6:5 706.700105 IPVS: ip_vs_conn_drop_conntrack: no conntrack for tuple=127.0.0.1:34258->127.0.0.1:80/6 706.700110 IPVS: Unbind-dest TCP c:127.0.0.1:34258 v:127.0.0.1:80 d:127.0.0.1:8080 fwd:M s:5 conn->flags:10100 conn->refcnt:0 dest->refcnt:15505 707.701103 IPVS: Enter: ip_vs_out, net/netfilter/ipvs/ip_vs_core.c line 1132 707.701128 IPVS: lookup/out TCP 127.0.0.1:34258->127.0.0.1:80 not hit 707.701143 IPVS: lookup/in TCP 127.0.0.1:34258->127.0.0.1:80 not hit 707.701156 IPVS: lookup service: fwm 0 TCP 127.0.0.1:80 hit 707.701169 IPVS: lookup/in TCP 127.0.0.1:80->127.0.0.1:34258 not hit 707.701180 IPVS: ip_vs_wlc_schedule(): Scheduling... 707.701193 IPVS: WLC: server 127.0.0.1:8080 activeconns 0 refcnt 15504 weight 1 overhead 15503 707.701210 IPVS: Bind-dest TCP c:127.0.0.1:34258 v:127.0.0.1:80 d:127.0.0.1:8080 fwd:M s:5 conn->flags:100 conn->refcnt:1 dest->refcnt:15505 707.701225 IPVS: Schedule fwd:M c:127.0.0.1:34258 v:127.0.0.1:80 d:127.0.0.1:8080 conn->flags:10140 conn->refcnt:2 707.701239 IPVS: TCP input [S...] 127.0.0.1:8080->127.0.0.1:34258 state: NONE->SYN_RECV conn->refcnt:2 707.701252 IPVS: Enter: ip_vs_nat_xmit, net/netfilter/ipvs/ip_vs_xmit.c line 625 707.701269 IPVS: ip_vs_update_conntrack: Updating conntrack ct=ffff8800ce14e5d0, status=0x100, ctinfo=2, old reply=127.0.0.1:80->127.0.0.1:34258/6, new reply=127.0.0.1:8080->127.0.0.1:34258/6, cp=127.0.0.1:34258->127.0.0.1:80->127.0.0.1:8080/6:3 707.701282 IPVS: Leave: ip_vs_nat_xmit, net/netfilter/ipvs/ip_vs_xmit.c line 698 707.701330 IPVS: Enter: ip_vs_out, net/netfilter/ipvs/ip_vs_core.c line 1132 707.701374 IPVS: Enter: ip_vs_out, net/netfilter/ipvs/ip_vs_core.c line 1132 [3] sysctl # uname -r 3.10.0-693.11.6.el7.x86_64.debug # sysctl net.ipv4.vs net.ipv4.vs.am_droprate = 10 net.ipv4.vs.amemthresh = 1024 net.ipv4.vs.backup_only = 0 net.ipv4.vs.cache_bypass = 0 net.ipv4.vs.conn_reuse_mode = 1 net.ipv4.vs.conntrack = 1 net.ipv4.vs.debug_level = 0 net.ipv4.vs.drop_entry = 0 net.ipv4.vs.drop_packet = 0 net.ipv4.vs.expire_nodest_conn = 0 net.ipv4.vs.expire_quiescent_template = 0 net.ipv4.vs.nat_icmp_send = 0 net.ipv4.vs.pmtu_disc = 1 net.ipv4.vs.secure_tcp = 0 net.ipv4.vs.snat_reroute = 1 net.ipv4.vs.sync_ports = 1 net.ipv4.vs.sync_qlen_max = 61432 net.ipv4.vs.sync_refresh_period = 0 net.ipv4.vs.sync_retries = 0 net.ipv4.vs.sync_sock_size = 0 net.ipv4.vs.sync_threshold = 3 50 net.ipv4.vs.sync_version = 1 # cat /sys/module/ip_vs/parameters/conn_tab_bits 18 [4] Steps to reproduce # modprobe ip_vs_wlc # sysctl -w net.ipv4.vs.conntrack=1 # sysctl -w net.ipv4.ip_local_port_range='32768 60999' # docker run --rm --net=host --detach --name=nginx nginx sh -c 'echo "daemon off; worker_processes 2; events { worker_connections 16384; } http { server { listen 8080 backlog=16384; access_log off; server_name localhost; location / { return 200; } } }" > /etc/nginx/ping.conf && nginx -c /etc/nginx/ping.conf' # docker run --rm --net=host --privileged mesosphere/net-toolbox ipvsadm -C # docker run --rm --net=host --privileged mesosphere/net-toolbox ipvsadm -A -t 127.0.0.1:80 # docker run --rm --net=host --privileged mesosphere/net-toolbox ipvsadm -a -t 127.0.0.1:80 -r 127.0.0.1:8080 -m # docker run --rm --net=host mesosphere/hey -n 30000 -c 1000 -disable-keepalive -t 3 http://127.0.0.1/ # awk '{print $8}' /proc/net/ip_vs_conn | sort | uniq -c # sysctl -w net.ipv4.vs.debug_level=15; strace -ttt curl --silent http://127.0.0.1/ 2>&1 | tee strace.log; sysctl -w net.ipv4.vs.debug_level=0 # dmesg | sed -re 's!^\[ *([0-9.]+)\]!\1!' | tee dmesg.log # docker run --rm --net=host --privileged mesosphere/net-toolbox ipvsadm -C # docker kill nginx
dmesg.log
Description: Binary data
strace.log
Description: Binary data
-- Yours, Sergey Urbanovich
_______________________________________________ Please read the documentation before posting - it's available at: http://www.linuxvirtualserver.org/ LinuxVirtualServer.org mailing list - lvs-users@LinuxVirtualServer.org Send requests to lvs-users-requ...@linuxvirtualserver.org or go to http://lists.graemef.net/mailman/listinfo/lvs-users