1. haproxy config: Same as given above (both processes and threads were given in the mail) 2. nginx: default, no changes. 3. sysctl's: nothing set. All changes as described earlier (e.g. irqbalance, irq pinning, etc). 4. nf_conntrack: disabled 5. dmesg: no messages.
With the same system and settings, threads gives 18x lesser RPS than processes, along with the other 2 issues given in my mail today. On Thu, Oct 4, 2018 at 12:09 PM Илья Шипицин <[email protected]> wrote: > haproxy config, nginx config > non default sysctl (if any) > > as a side note, can you have a look at "dmesg" output ? do you have nf > conntrack enabled ? what are its limits ? > > чт, 4 окт. 2018 г. в 9:59, Krishna Kumar (Engineering) < > [email protected]>: > >> Sure. >> >> 1. Client: Use one of the following two setup's: >> - a single baremetal (48 core, 40g) system >> Run: "wrk -c 4800 -t 48 -d 30s http://<IP>:80/128", or, >> - 100 2 core vm's. >> Run "wrk -c 16 -t 2 -d 30s http://<IP>:80/128" from >> each VM and summarize the results using some >> parallel-ssh setup. >> >> 2. HAProxy running on a single baremetal (same system config >> as client - 48 core, 40g, 4.17.13 kernel, irq tuned to use different >> cores of the same NUMA node for each irq, kill irqbalance, with >> haproxy configuration file as given in my first mail. Around 60 >> backend servers are configured in haproxy. >> >> 3. Backend servers are 2 core VM's running nginx and serving >> a file called "/128", which is 128 bytes in size. >> >> Let me know if you need more information. >> >> Thanks, >> - Krishna >> >> >> On Thu, Oct 4, 2018 at 10:21 AM Илья Шипицин <[email protected]> >> wrote: >> >>> load testing is somewhat good. >>> can you describe an overall setup ? (I want to reproduce and play with >>> it) >>> >>> чт, 4 окт. 2018 г. в 8:16, Krishna Kumar (Engineering) < >>> [email protected]>: >>> >>>> Re-sending in case this mail was missed. To summarise the 3 issues seen: >>>> >>>> 1. Performance drops 18x with higher number of nbthreads as compared to >>>> nbprocs. >>>> 2. CPU utilisation remains at 100% after wrk finishes for 30 seconds >>>> (for 1.9-dev3 >>>> for nbprocs and nbthreads). >>>> 3. Sockets on client remain in FIN-WAIT-2, while on HAProxy it remains >>>> in either >>>> CLOSE-WAIT (towards clients) and ESTAB (towards the backend >>>> servers), till >>>> the server/client timeout expires. >>>> >>>> The tests for threads and processes were done on the same systems, so >>>> there is >>>> no difference in system parameters. >>>> >>>> Thanks, >>>> - Krishna >>>> >>>> >>>> On Tue, Oct 2, 2018 at 9:18 PM Krishna Kumar (Engineering) < >>>> [email protected]> wrote: >>>> >>>>> Hi Willy, and community developers, >>>>> >>>>> I am not sure if I am doing something wrong, but wanted to report >>>>> some issues that I am seeing. Please let me know if this is a problem. >>>>> >>>>> 1. HAProxy system: >>>>> Kernel: 4.17.13, >>>>> CPU: 48 core E5-2670 v3 >>>>> Memory: 128GB memory >>>>> NIC: Mellanox 40g with IRQ pinning >>>>> >>>>> 2. Client, 48 core similar to server. Test command line: >>>>> wrk -c 4800 -t 48 -d 30s http://<IP:80>/128 >>>>> >>>>> 3. HAProxy version: I am testing both 1.8.14 and 1.9-dev3 (git >>>>> checkout as of >>>>> Oct 2nd). >>>>> # haproxy-git -vv >>>>> HA-Proxy version 1.9-dev3 2018/09/29 >>>>> Copyright 2000-2018 Willy Tarreau <[email protected]> >>>>> >>>>> Build options : >>>>> TARGET = linux2628 >>>>> CPU = generic >>>>> CC = gcc >>>>> CFLAGS = -O2 -g -fno-strict-aliasing -Wdeclaration-after-statement >>>>> -fwrapv -fno-strict-overflow -Wno-unused-label -Wno-sign-compare >>>>> -Wno-unused-parameter -Wno-old-style-declaration -Wno-ignored-qualifiers >>>>> -Wno-clobbered -Wno-missing-field-initializers -Wtype-limits >>>>> OPTIONS = USE_ZLIB=yes USE_OPENSSL=1 USE_PCRE=1 >>>>> >>>>> Default settings : >>>>> maxconn = 2000, bufsize = 16384, maxrewrite = 1024, maxpollevents = >>>>> 200 >>>>> >>>>> Built with OpenSSL version : OpenSSL 1.0.2g 1 Mar 2016 >>>>> Running on OpenSSL version : OpenSSL 1.0.2g 1 Mar 2016 >>>>> OpenSSL library supports TLS extensions : yes >>>>> OpenSSL library supports SNI : yes >>>>> OpenSSL library supports : TLSv1.0 TLSv1.1 TLSv1.2 >>>>> Built with transparent proxy support using: IP_TRANSPARENT >>>>> IPV6_TRANSPARENT IP_FREEBIND >>>>> Encrypted password support via crypt(3): yes >>>>> Built with multi-threading support. >>>>> Built with PCRE version : 8.38 2015-11-23 >>>>> Running on PCRE version : 8.38 2015-11-23 >>>>> PCRE library supports JIT : no (USE_PCRE_JIT not set) >>>>> Built with zlib version : 1.2.8 >>>>> Running on zlib version : 1.2.8 >>>>> Compression algorithms supported : identity("identity"), >>>>> deflate("deflate"), raw-deflate("deflate"), gzip("gzip") >>>>> Built with network namespace support. >>>>> >>>>> Available polling systems : >>>>> epoll : pref=300, test result OK >>>>> poll : pref=200, test result OK >>>>> select : pref=150, test result OK >>>>> Total: 3 (3 usable), will use epoll. >>>>> >>>>> Available multiplexer protocols : >>>>> (protocols markes as <default> cannot be specified using 'proto' >>>>> keyword) >>>>> h2 : mode=HTTP side=FE >>>>> <default> : mode=TCP|HTTP side=FE|BE >>>>> >>>>> Available filters : >>>>> [SPOE] spoe >>>>> [COMP] compression >>>>> [TRACE] trace >>>>> >>>>> 4. HAProxy results for #processes and #threads >>>>> # Threads-RPS Procs-RPS >>>>> 1 20903 19280 >>>>> 2 46400 51045 >>>>> 4 96587 142801 >>>>> 8 172224 254720 >>>>> 12 210451 437488 >>>>> 16 173034 437375 >>>>> 24 79069 519367 >>>>> 32 55607 586367 >>>>> 48 31739 596148 >>>>> >>>>> 5. Lock stats for 1.9-dev3: Some write locks on average took a lot >>>>> more time >>>>> to acquire, e.g. "POOL" and "TASK_WQ". For 48 threads, I get: >>>>> Stats about Lock FD: >>>>> # write lock : 143933900 >>>>> # write unlock: 143933895 (-5) >>>>> # wait time for write : 11370.245 msec >>>>> # wait time for write/lock: 78.996 nsec >>>>> # read lock : 0 >>>>> # read unlock : 0 (0) >>>>> # wait time for read : 0.000 msec >>>>> # wait time for read/lock : 0.000 nsec >>>>> Stats about Lock TASK_RQ: >>>>> # write lock : 2062874 >>>>> # write unlock: 2062875 (1) >>>>> # wait time for write : 7820.234 msec >>>>> # wait time for write/lock: 3790.941 nsec >>>>> # read lock : 0 >>>>> # read unlock : 0 (0) >>>>> # wait time for read : 0.000 msec >>>>> # wait time for read/lock : 0.000 nsec >>>>> Stats about Lock TASK_WQ: >>>>> # write lock : 2601227 >>>>> # write unlock: 2601227 (0) >>>>> # wait time for write : 5019.811 msec >>>>> # wait time for write/lock: 1929.786 nsec >>>>> # read lock : 0 >>>>> # read unlock : 0 (0) >>>>> # wait time for read : 0.000 msec >>>>> # wait time for read/lock : 0.000 nsec >>>>> Stats about Lock POOL: >>>>> # write lock : 2823393 >>>>> # write unlock: 2823393 (0) >>>>> # wait time for write : 11984.706 msec >>>>> # wait time for write/lock: 4244.788 nsec >>>>> # read lock : 0 >>>>> # read unlock : 0 (0) >>>>> # wait time for read : 0.000 msec >>>>> # wait time for read/lock : 0.000 nsec >>>>> Stats about Lock LISTENER: >>>>> # write lock : 184 >>>>> # write unlock: 184 (0) >>>>> # wait time for write : 0.011 msec >>>>> # wait time for write/lock: 60.554 nsec >>>>> # read lock : 0 >>>>> # read unlock : 0 (0) >>>>> # wait time for read : 0.000 msec >>>>> # wait time for read/lock : 0.000 nsec >>>>> Stats about Lock PROXY: >>>>> # write lock : 291557 >>>>> # write unlock: 291557 (0) >>>>> # wait time for write : 109.694 msec >>>>> # wait time for write/lock: 376.235 nsec >>>>> # read lock : 0 >>>>> # read unlock : 0 (0) >>>>> # wait time for read : 0.000 msec >>>>> # wait time for read/lock : 0.000 nsec >>>>> Stats about Lock SERVER: >>>>> # write lock : 1188511 >>>>> # write unlock: 1188511 (0) >>>>> # wait time for write : 854.171 msec >>>>> # wait time for write/lock: 718.690 nsec >>>>> # read lock : 0 >>>>> # read unlock : 0 (0) >>>>> # wait time for read : 0.000 msec >>>>> # wait time for read/lock : 0.000 nsec >>>>> Stats about Lock LBPRM: >>>>> # write lock : 1184709 >>>>> # write unlock: 1184709 (0) >>>>> # wait time for write : 778.947 msec >>>>> # wait time for write/lock: 657.501 nsec >>>>> # read lock : 0 >>>>> # read unlock : 0 (0) >>>>> # wait time for read : 0.000 msec >>>>> # wait time for read/lock : 0.000 nsec >>>>> Stats about Lock BUF_WQ: >>>>> # write lock : 669247 >>>>> # write unlock: 669247 (0) >>>>> # wait time for write : 252.265 msec >>>>> # wait time for write/lock: 376.939 nsec >>>>> # read lock : 0 >>>>> # read unlock : 0 (0) >>>>> # wait time for read : 0.000 msec >>>>> # wait time for read/lock : 0.000 nsec >>>>> Stats about Lock STRMS: >>>>> # write lock : 9335 >>>>> # write unlock: 9335 (0) >>>>> # wait time for write : 0.910 msec >>>>> # wait time for write/lock: 97.492 nsec >>>>> # read lock : 0 >>>>> # read unlock : 0 (0) >>>>> # wait time for read : 0.000 msec >>>>> # wait time for read/lock : 0.000 nsec >>>>> Stats about Lock VARS: >>>>> # write lock : 901947 >>>>> # write unlock: 901947 (0) >>>>> # wait time for write : 299.224 msec >>>>> # wait time for write/lock: 331.753 nsec >>>>> # read lock : 0 >>>>> # read unlock : 0 (0) >>>>> # wait time for read : 0.000 msec >>>>> # wait time for read/lock : 0.000 nsec >>>>> >>>>> 6. CPU utilization after test for processes/threads: haproxy-1.9-dev3 >>>>> runs >>>>> at 4800% (48 cpus) for 30 seconds after the test is done. For 1.8.14, >>>>> this behavior was not seen. Ran the following command for both: >>>>> "ss -tnp | awk '{print $1}' | sort | uniq -c | sort -n" >>>>> 1.8.14 during test: >>>>> 451 SYN-SENT >>>>> 9166 ESTAB >>>>> 1.8.14 after test: >>>>> 2 ESTAB >>>>> >>>>> 1.9-dev3 during test: >>>>> 109 SYN-SENT >>>>> 9400 ESTAB >>>>> 1.9-dev3 after test: >>>>> 2185 CLOSE-WAIT >>>>> 2187 ESTAB >>>>> All connections that were in CLOSE-WAIT were from the client, >>>>> while all >>>>> connections in ESTAB state were to the server. This lasted for 30 >>>>> seconds. >>>>> On the client system, all sockets were in FIN-WAIT-2 state: >>>>> 2186 FIN-WAIT-2 >>>>> This (2185/2186) seems to imply that client closed the connection >>>>> but >>>>> haproxy did not close the socket for 30 seconds. This also results >>>>> in >>>>> high CPU utilization on haproxy for some reason (100% for each >>>>> process >>>>> for 30 seconds), which is also unexpected as the remote side has >>>>> closed the >>>>> socket. >>>>> >>>>> 7. Configuration file for process mode: >>>>> global >>>>> daemon >>>>> maxconn 26000 >>>>> nbproc 48 >>>>> stats socket /var/run/ha-1-admin.sock mode 600 level admin process 1 >>>>> # (and so on for 48 processes). >>>>> >>>>> defaults >>>>> option http-keep-alive >>>>> balance leastconn >>>>> retries 2 >>>>> option redispatch >>>>> maxconn 25000 >>>>> option splice-response >>>>> option tcp-smart-accept >>>>> option tcp-smart-connect >>>>> option splice-auto >>>>> timeout connect 5000ms >>>>> timeout client 30000ms >>>>> timeout server 30000ms >>>>> timeout client-fin 30000ms >>>>> timeout http-request 10000ms >>>>> timeout http-keep-alive 75000ms >>>>> timeout queue 10000ms >>>>> timeout tarpit 15000ms >>>>> >>>>> frontend fk-fe-upgrade-80 >>>>> mode http >>>>> default_backend fk-be-upgrade >>>>> bind <VIP>:80 process 1 >>>>> # (and so on for 48 processes). >>>>> >>>>> backend fk-be-upgrade >>>>> mode http >>>>> default-server maxconn 2000 slowstart >>>>> # 58 server lines follow, e.g.: "server <name> <ip:80>" >>>>> >>>>> 8. Configuration file for thread mode: >>>>> global >>>>> daemon >>>>> maxconn 26000 >>>>> stats socket /var/run/ha-1-admin.sock mode 600 level admin >>>>> nbproc 1 >>>>> nbthread 48 >>>>> # cpu-map auto:1/1-48 0-39 >>>>> >>>>> defaults >>>>> option http-keep-alive >>>>> balance leastconn >>>>> retries 2 >>>>> option redispatch >>>>> maxconn 25000 >>>>> option splice-response >>>>> option tcp-smart-accept >>>>> option tcp-smart-connect >>>>> option splice-auto >>>>> timeout connect 5000ms >>>>> timeout client 30000ms >>>>> timeout server 30000ms >>>>> timeout client-fin 30000ms >>>>> timeout http-request 10000ms >>>>> timeout http-keep-alive 75000ms >>>>> timeout queue 10000ms >>>>> timeout tarpit 15000ms >>>>> >>>>> frontend fk-fe-upgrade-80 >>>>> mode http >>>>> bind <VIP>:80 process 1/1-48 >>>>> default_backend fk-be-upgrade >>>>> >>>>> backend fk-be-upgrade >>>>> mode http >>>>> default-server maxconn 2000 slowstart >>>>> # 58 server lines follow, e.g.: "server <name> <ip:80>" >>>>> >>>>> I had also captured 'perf' output for the system for thread vs >>>>> processes, >>>>> can send it later if required. >>>>> >>>>> Thanks, >>>>> - Krishna >>>>> >>>>

