Re-sending in case this mail was missed. To summarise the 3 issues seen:
1. Performance drops 18x with higher number of nbthreads as compared to
nbprocs.
2. CPU utilisation remains at 100% after wrk finishes for 30 seconds (for
1.9-dev3
for nbprocs and nbthreads).
3. Sockets on client remain in FIN-WAIT-2, while on HAProxy it remains in
either
CLOSE-WAIT (towards clients) and ESTAB (towards the backend servers),
till
the server/client timeout expires.
The tests for threads and processes were done on the same systems, so there
is
no difference in system parameters.
Thanks,
- Krishna
On Tue, Oct 2, 2018 at 9:18 PM Krishna Kumar (Engineering) <
[email protected]> wrote:
> Hi Willy, and community developers,
>
> I am not sure if I am doing something wrong, but wanted to report
> some issues that I am seeing. Please let me know if this is a problem.
>
> 1. HAProxy system:
> Kernel: 4.17.13,
> CPU: 48 core E5-2670 v3
> Memory: 128GB memory
> NIC: Mellanox 40g with IRQ pinning
>
> 2. Client, 48 core similar to server. Test command line:
> wrk -c 4800 -t 48 -d 30s http://<IP:80>/128
>
> 3. HAProxy version: I am testing both 1.8.14 and 1.9-dev3 (git checkout as
> of
> Oct 2nd).
> # haproxy-git -vv
> HA-Proxy version 1.9-dev3 2018/09/29
> Copyright 2000-2018 Willy Tarreau <[email protected]>
>
> Build options :
> TARGET = linux2628
> CPU = generic
> CC = gcc
> CFLAGS = -O2 -g -fno-strict-aliasing -Wdeclaration-after-statement
> -fwrapv -fno-strict-overflow -Wno-unused-label -Wno-sign-compare
> -Wno-unused-parameter -Wno-old-style-declaration -Wno-ignored-qualifiers
> -Wno-clobbered -Wno-missing-field-initializers -Wtype-limits
> OPTIONS = USE_ZLIB=yes USE_OPENSSL=1 USE_PCRE=1
>
> Default settings :
> maxconn = 2000, bufsize = 16384, maxrewrite = 1024, maxpollevents = 200
>
> Built with OpenSSL version : OpenSSL 1.0.2g 1 Mar 2016
> Running on OpenSSL version : OpenSSL 1.0.2g 1 Mar 2016
> OpenSSL library supports TLS extensions : yes
> OpenSSL library supports SNI : yes
> OpenSSL library supports : TLSv1.0 TLSv1.1 TLSv1.2
> Built with transparent proxy support using: IP_TRANSPARENT
> IPV6_TRANSPARENT IP_FREEBIND
> Encrypted password support via crypt(3): yes
> Built with multi-threading support.
> Built with PCRE version : 8.38 2015-11-23
> Running on PCRE version : 8.38 2015-11-23
> PCRE library supports JIT : no (USE_PCRE_JIT not set)
> Built with zlib version : 1.2.8
> Running on zlib version : 1.2.8
> Compression algorithms supported : identity("identity"),
> deflate("deflate"), raw-deflate("deflate"), gzip("gzip")
> Built with network namespace support.
>
> Available polling systems :
> epoll : pref=300, test result OK
> poll : pref=200, test result OK
> select : pref=150, test result OK
> Total: 3 (3 usable), will use epoll.
>
> Available multiplexer protocols :
> (protocols markes as <default> cannot be specified using 'proto' keyword)
> h2 : mode=HTTP side=FE
> <default> : mode=TCP|HTTP side=FE|BE
>
> Available filters :
> [SPOE] spoe
> [COMP] compression
> [TRACE] trace
>
> 4. HAProxy results for #processes and #threads
> # Threads-RPS Procs-RPS
> 1 20903 19280
> 2 46400 51045
> 4 96587 142801
> 8 172224 254720
> 12 210451 437488
> 16 173034 437375
> 24 79069 519367
> 32 55607 586367
> 48 31739 596148
>
> 5. Lock stats for 1.9-dev3: Some write locks on average took a lot more
> time
> to acquire, e.g. "POOL" and "TASK_WQ". For 48 threads, I get:
> Stats about Lock FD:
> # write lock : 143933900
> # write unlock: 143933895 (-5)
> # wait time for write : 11370.245 msec
> # wait time for write/lock: 78.996 nsec
> # read lock : 0
> # read unlock : 0 (0)
> # wait time for read : 0.000 msec
> # wait time for read/lock : 0.000 nsec
> Stats about Lock TASK_RQ:
> # write lock : 2062874
> # write unlock: 2062875 (1)
> # wait time for write : 7820.234 msec
> # wait time for write/lock: 3790.941 nsec
> # read lock : 0
> # read unlock : 0 (0)
> # wait time for read : 0.000 msec
> # wait time for read/lock : 0.000 nsec
> Stats about Lock TASK_WQ:
> # write lock : 2601227
> # write unlock: 2601227 (0)
> # wait time for write : 5019.811 msec
> # wait time for write/lock: 1929.786 nsec
> # read lock : 0
> # read unlock : 0 (0)
> # wait time for read : 0.000 msec
> # wait time for read/lock : 0.000 nsec
> Stats about Lock POOL:
> # write lock : 2823393
> # write unlock: 2823393 (0)
> # wait time for write : 11984.706 msec
> # wait time for write/lock: 4244.788 nsec
> # read lock : 0
> # read unlock : 0 (0)
> # wait time for read : 0.000 msec
> # wait time for read/lock : 0.000 nsec
> Stats about Lock LISTENER:
> # write lock : 184
> # write unlock: 184 (0)
> # wait time for write : 0.011 msec
> # wait time for write/lock: 60.554 nsec
> # read lock : 0
> # read unlock : 0 (0)
> # wait time for read : 0.000 msec
> # wait time for read/lock : 0.000 nsec
> Stats about Lock PROXY:
> # write lock : 291557
> # write unlock: 291557 (0)
> # wait time for write : 109.694 msec
> # wait time for write/lock: 376.235 nsec
> # read lock : 0
> # read unlock : 0 (0)
> # wait time for read : 0.000 msec
> # wait time for read/lock : 0.000 nsec
> Stats about Lock SERVER:
> # write lock : 1188511
> # write unlock: 1188511 (0)
> # wait time for write : 854.171 msec
> # wait time for write/lock: 718.690 nsec
> # read lock : 0
> # read unlock : 0 (0)
> # wait time for read : 0.000 msec
> # wait time for read/lock : 0.000 nsec
> Stats about Lock LBPRM:
> # write lock : 1184709
> # write unlock: 1184709 (0)
> # wait time for write : 778.947 msec
> # wait time for write/lock: 657.501 nsec
> # read lock : 0
> # read unlock : 0 (0)
> # wait time for read : 0.000 msec
> # wait time for read/lock : 0.000 nsec
> Stats about Lock BUF_WQ:
> # write lock : 669247
> # write unlock: 669247 (0)
> # wait time for write : 252.265 msec
> # wait time for write/lock: 376.939 nsec
> # read lock : 0
> # read unlock : 0 (0)
> # wait time for read : 0.000 msec
> # wait time for read/lock : 0.000 nsec
> Stats about Lock STRMS:
> # write lock : 9335
> # write unlock: 9335 (0)
> # wait time for write : 0.910 msec
> # wait time for write/lock: 97.492 nsec
> # read lock : 0
> # read unlock : 0 (0)
> # wait time for read : 0.000 msec
> # wait time for read/lock : 0.000 nsec
> Stats about Lock VARS:
> # write lock : 901947
> # write unlock: 901947 (0)
> # wait time for write : 299.224 msec
> # wait time for write/lock: 331.753 nsec
> # read lock : 0
> # read unlock : 0 (0)
> # wait time for read : 0.000 msec
> # wait time for read/lock : 0.000 nsec
>
> 6. CPU utilization after test for processes/threads: haproxy-1.9-dev3 runs
> at 4800% (48 cpus) for 30 seconds after the test is done. For 1.8.14,
> this behavior was not seen. Ran the following command for both:
> "ss -tnp | awk '{print $1}' | sort | uniq -c | sort -n"
> 1.8.14 during test:
> 451 SYN-SENT
> 9166 ESTAB
> 1.8.14 after test:
> 2 ESTAB
>
> 1.9-dev3 during test:
> 109 SYN-SENT
> 9400 ESTAB
> 1.9-dev3 after test:
> 2185 CLOSE-WAIT
> 2187 ESTAB
> All connections that were in CLOSE-WAIT were from the client, while all
> connections in ESTAB state were to the server. This lasted for 30
> seconds.
> On the client system, all sockets were in FIN-WAIT-2 state:
> 2186 FIN-WAIT-2
> This (2185/2186) seems to imply that client closed the connection but
> haproxy did not close the socket for 30 seconds. This also results in
> high CPU utilization on haproxy for some reason (100% for each process
> for 30 seconds), which is also unexpected as the remote side has
> closed the
> socket.
>
> 7. Configuration file for process mode:
> global
> daemon
> maxconn 26000
> nbproc 48
> stats socket /var/run/ha-1-admin.sock mode 600 level admin process 1
> # (and so on for 48 processes).
>
> defaults
> option http-keep-alive
> balance leastconn
> retries 2
> option redispatch
> maxconn 25000
> option splice-response
> option tcp-smart-accept
> option tcp-smart-connect
> option splice-auto
> timeout connect 5000ms
> timeout client 30000ms
> timeout server 30000ms
> timeout client-fin 30000ms
> timeout http-request 10000ms
> timeout http-keep-alive 75000ms
> timeout queue 10000ms
> timeout tarpit 15000ms
>
> frontend fk-fe-upgrade-80
> mode http
> default_backend fk-be-upgrade
> bind <VIP>:80 process 1
> # (and so on for 48 processes).
>
> backend fk-be-upgrade
> mode http
> default-server maxconn 2000 slowstart
> # 58 server lines follow, e.g.: "server <name> <ip:80>"
>
> 8. Configuration file for thread mode:
> global
> daemon
> maxconn 26000
> stats socket /var/run/ha-1-admin.sock mode 600 level admin
> nbproc 1
> nbthread 48
> # cpu-map auto:1/1-48 0-39
>
> defaults
> option http-keep-alive
> balance leastconn
> retries 2
> option redispatch
> maxconn 25000
> option splice-response
> option tcp-smart-accept
> option tcp-smart-connect
> option splice-auto
> timeout connect 5000ms
> timeout client 30000ms
> timeout server 30000ms
> timeout client-fin 30000ms
> timeout http-request 10000ms
> timeout http-keep-alive 75000ms
> timeout queue 10000ms
> timeout tarpit 15000ms
>
> frontend fk-fe-upgrade-80
> mode http
> bind <VIP>:80 process 1/1-48
> default_backend fk-be-upgrade
>
> backend fk-be-upgrade
> mode http
> default-server maxconn 2000 slowstart
> # 58 server lines follow, e.g.: "server <name> <ip:80>"
>
> I had also captured 'perf' output for the system for thread vs processes,
> can send it later if required.
>
> Thanks,
> - Krishna
>