Hi, I'm using haproxy 1.9.6 from deb repo: #v+ ~# haproxy -vv HA-Proxy version 1.9.6-1ppa1~bionic 2019/03/30 - https://haproxy.org/ Build options : TARGET = linux2628 CPU = generic CC = gcc CFLAGS = -O2 -g -O2 -fdebug-prefix-map=/build/haproxy-YXfmbO/haproxy-1.9.6=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fno-strict-aliasing -Wdeclaration-after-statement -fwrapv -Wno-format-truncation -Wno-unused-label -Wno-sign-compare -Wno-unused-parameter -Wno-old-style-declaration -Wno-ignored-qualifiers -Wno-clobbered -Wno-missing-field-initializers -Wno-implicit-fallthrough -Wno-stringop-overflow -Wtype-limits -Wshift-negative-value -Wshift-overflow=2 -Wduplicated-cond -Wnull-dereference OPTIONS = USE_GETADDRINFO=1 USE_ZLIB=1 USE_REGPARM=1 USE_OPENSSL=1 USE_LUA=1 USE_SYSTEMD=1 USE_PCRE2=1 USE_PCRE2_JIT=1 USE_NS=1
Default settings : maxconn = 2000, bufsize = 16384, maxrewrite = 1024, maxpollevents = 200 Built with OpenSSL version : OpenSSL 1.1.0g 2 Nov 2017 Running on OpenSSL version : OpenSSL 1.1.0g 2 Nov 2017 OpenSSL library supports TLS extensions : yes OpenSSL library supports SNI : yes OpenSSL library supports : TLSv1.0 TLSv1.1 TLSv1.2 Built with Lua version : Lua 5.3.3 Built with network namespace support. Built with transparent proxy support using: IP_TRANSPARENT IPV6_TRANSPARENT IP_FREEBIND Built with zlib version : 1.2.11 Running on zlib version : 1.2.11 Compression algorithms supported : identity("identity"), deflate("deflate"), raw-deflate("deflate"), gzip("gzip") Built with PCRE2 version : 10.31 2018-02-12 PCRE2 library supports JIT : yes Encrypted password support via crypt(3): yes Built with multi-threading support. Available polling systems : epoll : pref=300, test result OK poll : pref=200, test result OK select : pref=150, test result OK Total: 3 (3 usable), will use epoll. Available multiplexer protocols : (protocols marked as <default> cannot be specified using 'proto' keyword) h2 : mode=HTX side=FE|BE h2 : mode=HTTP side=FE <default> : mode=HTX side=FE|BE <default> : mode=TCP|HTTP side=FE|BE Available filters : [SPOE] spoe [COMP] compression [CACHE] cache [TRACE] trace #v- Every few days I see some servers with few hundreds connections in CLOSE_WAIT state for hours. I tried suggested earlier here - "show fd" to construct a bug report but whenever I run "show fd" (echo 'show fd' | socat stdio /run/haproxy/haproxy.sock) all CPU cores are with 100% utilization and haproxy is unresponsive (needs to be restarted). I was able to run gdb before and after "show fd": before: #v+ (gdb) bt #0 0x00007fbb8fc36bb7 in epoll_wait () from /lib/x86_64-linux-gnu/libc.so.6 #1 0x0000556f2801eb6c in _do_poll (p=<optimized out>, exp=144557181) at src/ev_epoll.c:156 #2 0x0000556f280c2152 in run_poll_loop () at src/haproxy.c:2675 #3 run_thread_poll_loop (data=<optimized out>) at src/haproxy.c:2707 #4 0x0000556f2801c616 in main (argc=<optimized out>, argv=0x7ffd733e9da8) at src/haproxy.c:3343 #v- After "show fd" #v+ (gdb) bt #0 0x00007fbb8fc18e57 in sched_yield () from /lib/x86_64-linux-gnu/libc.so.6 #1 0x0000556f2815aef5 in thread_harmless_till_end () at src/hathreads.c:46 #2 0x0000556f2801f24e in thread_harmless_end () at include/common/hathreads.h:367 #3 _do_poll (p=<optimized out>, exp=144595172) at src/ev_epoll.c:171 #4 0x0000556f280c2152 in run_poll_loop () at src/haproxy.c:2675 #5 run_thread_poll_loop (data=<optimized out>) at src/haproxy.c:2707 #6 0x0000556f2801c616 in main (argc=<optimized out>, argv=0x7ffd733e9da8) at src/haproxy.c:3343 #v- strace looks like that: #v+ [pid 15426] sched_yield() = 0 [pid 15426] sched_yield() = 0 [pid 15426] sched_yield() = 0 [pid 15426] sched_yield() = 0 [pid 15426] sched_yield() = 0 [pid 15426] sched_yield() = 0 [pid 15426] sched_yield() = 0 [pid 15426] sched_yield() = 0 #v- other example: #v+ [pid 6102] recvfrom(1944, "show fd\n", 15360, 0, NULL, NULL) = 8 [pid 6102] recvfrom(1944, "", 15352, 0, NULL, NULL) = 0 [pid 6102] clock_gettime(CLOCK_THREAD_CPUTIME_ID, {tv_sec=1671, tv_nsec=945933699}) = 0 [pid 6102] epoll_wait(101, [{EPOLLIN, {u32=11, u64=11}}], 200, 0) = 1 [pid 6102] clock_gettime(CLOCK_THREAD_CPUTIME_ID, {tv_sec=1671, tv_nsec=945944207}) = 0 [pid 6102] sched_yield() = 0 [pid 6102] sched_yield() = 0 [pid 6102] sched_yield() = 0 [pid 6102] sched_yield() = 0 [pid 6102] sched_yield() = 0 [pid 6102] sched_yield() = 0 [pid 6102] sched_yield() = 0 [pid 6102] sched_yield() = 0 […] #v- config: #v+ global user haproxy group haproxy maxconn 20480 daemon ca-base /etc/ssl/certs crt-base /etc/ssl/certs stats socket /run/haproxy/haproxy.sock mode 660 level admin stats timeout 2m # Wait up to 2 minutes for input nbthread 32 tune.ssl.default-dh-param 4096 ssl-default-bind-ciphers EECDH+AESGCM:AES256+EECDH:AES128+EECDH:EECDH:RSA+AES256+SHA:RSA+AES128+SHA ssl-default-bind-options ssl-min-ver TLSv1.2 defaults log 127.0.0.1 local1 notice mode http option httplog option log-health-checks option dontlognull option dontlog-normal retries 3 option redispatch maxconn 20480 timeout connect 60s timeout server 60s timeout client 60s listen stats bind 10.12.70.29:81 mode http stats enable stats refresh 5s stats uri /haproxy?stats stats realm Haproxy\ Statistics stats hide-version stats auth login:password frontend http-in bind :80 v4v6 bind :::80 v6only maxconn 30720 http-request set-header X-Forwarded-For %[src] acl is_healthcheck path /__healthcheck use_backend healthcheck if is_healthcheck default_backend varnish-groups frontend https-in bind :443 v4v6 ssl crt wildcard_example.com.pem alpn h2,http/1.1 bind :::443 v6only ssl crt wildcard_example.com.pem alpn h2,http/1.1 maxconn 30720 http-request set-header X-Forwarded-For %[src] acl is_healthcheck path /__healthcheck use_backend healthcheck if is_healthcheck default_backend varnish-groups backend healthcheck errorfile 503 /etc/haproxy/errors/200.http backend varnish-groups balance uri hash-type consistent option httpchk HEAD /ping HTTP/1.1\r\nHost:\ example.com http-check expect status 200 default-server inter 3s fall 3 rise 2 server varnish 127.0.0.1:6081 check #v- The issue only appears when there are connection in CLOSE_WAIT state for hours, normally "show fd" works fine. Best -- Łukasz Jagiełło lukasz<at>jagiello<dot>org