Hi,

We have been running HAProxy on OpenBSD for serveral years (currently
OpenBSD 7.2 / HAProxy 2.6.7) and everything has been working perfect
until a recent event of higher than normal traffic. It was an unexpected
flood to one site and above ~1100 cur sessions we started to see major
impacts to all sites behind haproxy in different frontends/backends than 
the one with heavy traffic. All sites started to respond very slowly or 
failed to load (including requests that haproxy denies before hitting 
backend), health checks in various other backends started flapping, and
once the heavy traffic stopped everything returned to normal. Total 
bandwidth was less than 50Mbps on a 10G port (Intel 82599 ix fiber NIC).

After that issue we have been doing some load testing to try to gain
more info and make any possible tweaks. Using the ddosify load testing
tool (-d 60 -n 70000) from a single machine (different from haproxy) is
able to reproduce the same issue we saw with real traffic.

When starting a load test HAProxy handles 400-500 requests per second
for about 3 seconds. After the first few seconds of heavy traffic, the
test error rate immediately starts to shoot up to 75%+ connection/read
timeouts and other haproxy sites/health checks start to be impacted. We
had to stop the test after only 11 seconds to restore responsiveness to
other sites. This is a physical server with 2x E5-2698 v4 20 core CPUs
(hyperthreading disabled) and the haproxy process uses about 1545% CPU
under this load. Overall CPU utilization is 21% user, 0% nice, 37% sys,
18% spin, 0.7% intr, 23.1% idle. There was no noticeable impact on
connections to other services that this box NATs via PF to other servers
outside of haproxy. 

Installing FreeBSD 13.1 on an identical machine and trying the same test
on the same 2.6.7 with the same config and backend servers, the results
are more what I would expect out of this hardware - haproxy has no
problems handling over 10,000 req/sec and 40k connections without
impacting any traffic/health checks, and only about 30% overall CPU
usage at that traffic level. 100% success rate on the load tests with
plenty of headroom on the haproxy box to handle way more. The backend
servers were actually the bottleneck in the FreeBSD test.

I understand that raw performance on OpenBSD is sometimes not as high as
other OSes in some scenarios, but the difference of 500 vs 10,000+
req/sec and 1100 vs 40,000 connections here is very large so I wanted to
see if there are any thoughts, known issues, or tunables that could
possibly help improve HAProxy throughput on OpenBSD?

The usual OS tunables openfiles-cur/openfiles-max are raised to 200k,
kern.maxfiles=205000 (openfiles peaked at 15k), and haproxy stats
reports those as expected. PF state limit is raised to 1 million and
peaked at 72k in use. BIOS power profile is set to max performance.

pid = 78180 (process #1, nbproc = 1, nbthread = 32)
uptime = 1d 19h10m11s
system limits: memmax = unlimited; ulimit-n = 200000
maxsock = 200000; maxconn = 99904; maxpipes = 0

No errors that I can see in logs about hitting any limits. There is no
change in results with http vs https, http/1.1 vs h2, with or without
httplog, or reducing nbthread on this 40 core machine. If there are any
other details I can provide please let me know.

Thanks in advance for any input!

----------------------------------------
global
  chroot  /var/haproxy
  daemon  
  log  127.0.0.1 local2
  nbthread  32
  pidfile  /var/run/haproxy.pid
  ssl-default-bind-ciphers  HIGH:!aNULL:!MD5
  ssl-default-bind-options  no-tlsv10
  ssl-default-server-ciphers  HIGH:!aNULL:!MD5
  stats  socket /var/haproxy/stats level admin mode 775 group sysadmin

defaults
  default-server  inter 5s
  default-server  fastinter 2s
  default-server  downinter 2s
  log  global
  mode  http
  option  httplog
  option  dontlognull
  option  redispatch
  option  log-health-checks
  retries  3
  source  0.0.0.0 usesrc clientip
  timeout  http-request 10s
  timeout  queue 1m
  timeout  connect 10s
  timeout  client 30m
  timeout  server 30m
  timeout  http-keep-alive 10s
  timeout  check 10s

listen test_https
  bind ip.ip.ip.ip:443 ssl crt /path/to/cert.pem no-tlsv11 alpn h2,http/1.1
  mode http
  balance roundrobin
  server 192.168.25.7:443 192.168.25.7:443 check ssl verify none alpn 
h2,http/1.1
  server 192.168.25.26:443 192.168.25.26:443 check ssl verify none alpn 
h2,http/1.1
  server maintenance 127.0.0.1:8081 source 0.0.0.0 backup
  
----------------------------------------
 
$ haproxy -vv
HAProxy version 2.6.7-c55bfdb 2022/12/02 - https://haproxy.org/
Status: long-term supported branch - will stop receiving fixes around Q2 2027.
Known bugs: http://www.haproxy.org/bugs/bugs-2.6.7.html
Running on: OpenBSD 7.2 GENERIC.MP#0 amd64
Build options :
  TARGET  = openbsd
  CPU     = generic
  CC      = cc
  CFLAGS  = -O2 -pipe -g -Wall -Wextra -Wundef -Wdeclaration-after-statement 
-Wfatal-errors -Wtype-limits -Wshift-negative-value -Wnull-dereference -fwrapv 
-Wno-unknown-warning-option -Wno-address-of-packed-member -Wno-unused-label 
-Wno-sign-compare -Wno-unused-parameter -Wno-clobbered 
-Wno-missing-field-initializers -Wno-cast-function-type -Wno-string-plus-int 
-Wno-atomic-alignment
  OPTIONS = USE_PCRE2=1 USE_OPENSSL=1 USE_ZLIB=1
  DEBUG   = -DDEBUG_STRICT -DDEBUG_MEMORY_POOLS

Feature list : -EPOLL +KQUEUE -NETFILTER -PCRE -PCRE_JIT +PCRE2 -PCRE2_JIT 
+POLL +THREAD -BACKTRACE -STATIC_PCRE -STATIC_PCRE2 +TPROXY -LINUX_TPROXY 
-LINUX_SPLICE +LIBCRYPT -CRYPT_H -ENGINE +GETADDRINFO +OPENSSL -LUA +ACCEPT4 
+CLOSEFROM +ZLIB -SLZ -CPU_AFFINITY -TFO -NS -DL -RT -DEVICEATLAS -51DEGREES 
-WURFL -SYSTEMD -OBSOLETE_LINKER -PRCTL -PROCCTL -THREAD_DUMP -EVPORTS -OT 
-QUIC -PROMEX -MEMORY_PROFILING

Default settings :
  bufsize = 16384, maxrewrite = 1024, maxpollevents = 200

Built with multi-threading support (MAX_THREADS=64, default=1).
Built with OpenSSL version : LibreSSL 3.6.0
Running on OpenSSL version : LibreSSL 3.6.0
OpenSSL library supports TLS extensions : yes
OpenSSL library supports SNI : yes
OpenSSL library supports : TLSv1.0 TLSv1.1 TLSv1.2 TLSv1.3
Support for malloc_trim() is enabled.
Built with zlib version : 1.2.12
Running on zlib version : 1.2.12
Compression algorithms supported : identity("identity"), deflate("deflate"), 
raw-deflate("deflate"), gzip("gzip")
Built with transparent proxy support using: SO_BINDANY
Built with PCRE2 version : 10.37 2021-05-26
PCRE2 library supports JIT : no (USE_PCRE2_JIT not set)
Encrypted password support via crypt(3): yes
Built with clang compiler version 13.0.0 

Available polling systems :
     kqueue : pref=300,  test result OK
       poll : pref=200,  test result OK
     select : pref=150,  test result OK
Total: 3 (3 usable), will use kqueue.

Available multiplexer protocols :
(protocols marked as <default> cannot be specified using 'proto' keyword)
         h2 : mode=HTTP  side=FE|BE  mux=H2    flags=HTX|HOL_RISK|NO_UPG
       fcgi : mode=HTTP  side=BE     mux=FCGI  flags=HTX|HOL_RISK|NO_UPG
         h1 : mode=HTTP  side=FE|BE  mux=H1    flags=HTX|NO_UPG
  <default> : mode=HTTP  side=FE|BE  mux=H1    flags=HTX
       none : mode=TCP   side=FE|BE  mux=PASS  flags=NO_UPG
  <default> : mode=TCP   side=FE|BE  mux=PASS  flags=

Available services : none

Available filters :
        [CACHE] cache
        [COMP] compression
        [FCGI] fcgi-app
        [SPOE] spoe
        [TRACE] trace
$

Reply via email to