Hello,

Sorry for my lengthy post, but I wanted to give as much info upfront as 
possible, since it takes a bunch of guesswork out of it!

I've recently started testing a combo of HAProxy 3.1 and Varnish 7.6 for some 
content delivery / offloading, and I'm a bit curious if people have any 
data/suggestions/optimizations that can be done to push things further in terms 
of performance.

I tried to use the HAProxy PPA for Ubuntu 24.04 (noble) ( 
https://launchpad.net/~vbernat/+archive/ubuntu/haproxy-3.1 ), thanks Vincent 
for providing these!
The build from the PPA uses the OS distributed OpenSSL, which is 3.0.13 on 
Ubuntu 24.04.
I also have a custom build where I compiled in AWS-LC version 1.42.

PPA distribution:
Build options :
  TARGET  = linux-glibc
  CC      = x86_64-linux-gnu-gcc
  CFLAGS  = -O2 -g -fwrapv -g -O2 -fno-omit-frame-pointer 
-mno-omit-leaf-frame-pointer -flto=auto -ffat-lto-objects 
-fstack-protector-strong -fstack-clash-protection -Wformat 
-Werror=format-security -fcf-protection 
-fdebug-prefix-map=/build/haproxy-ScKxv0/haproxy-3.1.1=/usr/src/haproxy-3.1.1-1ppa1~noble
 -Wdate-time -D_FORTIFY_SOURCE=3
  OPTIONS = USE_OPENSSL=1 USE_LUA=1 USE_SLZ=1 USE_OT=1 USE_QUIC=1 USE_PROMEX=1 
USE_PCRE2=1 USE_PCRE2_JIT=1 USE_QUIC_OPENSSL_COMPAT=1
  DEBUG   =

Feature list : -51DEGREES +ACCEPT4 +BACKTRACE -CLOSEFROM +CPU_AFFINITY +CRYPT_H 
-DEVICEATLAS +DL -ENGINE +EPOLL -EVPORTS +GETADDRINFO -KQUEUE -LIBATOMIC 
+LIBCRYPT +LINUX_CAP +LINUX_SPLICE +LINUX_TPROXY +LUA +MATH -MEMORY_PROFILING 
+NETFILTER +NS -OBSOLETE_LINKER +OPENSSL -OPENSSL_AWSLC -OPENSSL_WOLFSSL +OT 
-PCRE +PCRE2 +PCRE2_JIT -PCRE_JIT +POLL +PRCTL -PROCCTL +PROMEX 
-PTHREAD_EMULATION +QUIC +QUIC_OPENSSL_COMPAT +RT +SHM_OPEN +SLZ +SSL 
-STATIC_PCRE -STATIC_PCRE2 +TFO +THREAD +THREAD_DUMP +TPROXY -WURFL -ZLIB

My own build:
Build options :
  TARGET  = linux-glibc
  CC      = cc
  CFLAGS  = -O2 -g -fwrapv
  OPTIONS = USE_OPENSSL_AWSLC=1 USE_SLZ=1 USE_QUIC=1 USE_PROMEX=1 USE_PCRE2=1 
USE_PCRE2_JIT=1
  DEBUG   =

Feature list : -51DEGREES +ACCEPT4 +BACKTRACE -CLOSEFROM +CPU_AFFINITY +CRYPT_H 
-DEVICEATLAS +DL -ENGINE +EPOLL -EVPORTS +GETADDRINFO -KQUEUE -LIBATOMIC 
+LIBCRYPT +LINUX_CAP +LINUX_SPLICE +LINUX_TPROXY -LUA -MATH -MEMORY_PROFILING 
+NETFILTER +NS -OBSOLETE_LINKER +OPENSSL +OPENSSL_AWSLC -OPENSSL_WOLFSSL -OT 
-PCRE +PCRE2 +PCRE2_JIT -PCRE_JIT +POLL +PRCTL -PROCCTL +PROMEX 
-PTHREAD_EMULATION +QUIC -QUIC_OPENSSL_COMPAT +RT +SHM_OPEN +SLZ +SSL 
-STATIC_PCRE -STATIC_PCRE2 +TFO +THREAD +THREAD_DUMP +TPROXY -WURFL -ZLIB

I know there's quite a few differences in the CFlags, but I went with it anyway!

Test System:
- E5-2698v4 (20 cores, 40 threads)
- 128GB of 2133MHz DDR4 RAM (16GB DIMMs, in the correct banks)
- Ubuntu 24.04 with generic kernel

Haproxy config:
global
log /dev/log local0
log /dev/log local1 notice
chroot /var/lib/haproxy
stats socket /run/haproxy/admin.sock mode 660 level admin
stats timeout 30s
user haproxy
group haproxy
        maxconn 50000
daemon

# Default SSL material locations
ca-base /etc/ssl/certs
crt-base /etc/ssl/private

# See: 
https://ssl-config.mozilla.org/#server=haproxy&server-version=2.0.3&config=intermediate
        ssl-default-bind-ciphers 
ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-RSA-CHACHA20-POLY1305:DHE-RSA-AES128-GCM-SHA256:DHE-RSA-AES256-GCM-SHA384
        ssl-default-bind-ciphersuites 
TLS_AES_128_GCM_SHA256:TLS_AES_256_GCM_SHA384:TLS_CHACHA20_POLY1305_SHA256
        ssl-default-bind-options ssl-min-ver TLSv1.2 no-tls-tickets

defaults
log global
mode http
option httplog
option dontlognull
        timeout connect 5000
        timeout client  50000
        timeout server  50000
errorfile 400 /etc/haproxy/errors/400.http
errorfile 403 /etc/haproxy/errors/403.http
errorfile 408 /etc/haproxy/errors/408.http
errorfile 500 /etc/haproxy/errors/500.http
errorfile 502 /etc/haproxy/errors/502.http
errorfile 503 /etc/haproxy/errors/503.http
errorfile 504 /etc/haproxy/errors/504.http

frontend ft_web
    bind *:80,:::80 v6only
    bind *:443,:::443 v6only ssl crt /etc/haproxy/ssl/
    bind quic4@:443 ssl crt /etc/haproxy/ssl/ alpn h3
    bind quic6@:443 ssl crt /etc/haproxy/ssl/ alpn h3

    http-after-response add-header alt-svc 'h3=":443"; ma=60'

    option forwardfor
    no log

    http-request set-header X-Forwarded-Proto https if { ssl_fc }
    http-request set-header X-Forwarded-Proto http if !{ ssl_fc }

    default_backend bk_varnish

backend bk_varnish
    mode http
    no log
    server varnish1 127.0.0.1:8443 send-proxy


The SSL certificate being used is a P-256 ECC certificate.

h2load using 14 threads, 100k requests, 100 clients and a 10 concurrent streams 
per client, requesting a 2 megabyte jpeg image.

It's worth noting with raw varnish (127.0.0.1:6081) I am able to push the 
system to 175Gbps of traffic being served out of Varnish via h2c, this 
obviously brings the challenge of having to run h2load on the same system, 
since I.. sadly do not have 2x 200G connected servers!

If I use haproxy for terminating SSL using OpenSSL 3.0, I am able to do 
63.76Gbps of traffic over http2 (h2)
Doing the same test using haproxy that uses AWS-LC v1.42, I get 64.21Gbps of 
traffic over http2 (h2)

Repeating the tests, averaging them out, it ends up being within margin of 
error of each other.

Question is, whether this is to be expected, that they'll perform roughly the 
same in semi-large chunks of data (2MB), or if there's some key CFlags options 
that I am missing to maybe push it a bit further

>From the various posts on the interwebs, and the haproxy wiki on GitHub, it 
>seems that people in general do not recommend OpenSSL 3.0 due to it's bad 
>performance, and AWS-LC should perform quite a bit better.
But is that largely for small files or should it be overall an improvement?

Is there anything I could possibly do to push things further, target filesize 
being 2 megabyte, which is served straight out of memory from Varnish

It's worth noting I did the same test on an EPYC 7502, and I get only 
marginally better performance, at around 68Gbps of SSL traffic (180Gbps of h2c 
via Varnish directly).

I know 68Gbps also means doing that on localhost on top, due to the backend 
bk_varnish.

Another thing I noted, while testing QUIC/HTTP3 as well, is that haproxy seems 
to consume about 4x as much resources to serve the same amount of traffic 
(currently ~ 2Gbps of actual traffic) than it does for h2, is that something 
that's likely to improve in the future?

Best Regards,
Lucas Rolff

Reply via email to