Hi, We recently hit an issue where we observed the haproxy_frontend_current_sessions reported by the prometheus endpoint plateau at 4095 and some requests start dropping. Increasing the global and listen maxconn from 4096 to something larger (as well as making the kernel TCP queues on our Ubuntu 22.04 OS slighty larger) fixed the issue.
The cause seems to have been a switch from http to https traffic due to a client side config change, rather than an increase in the number of requests, so I started looking at CPU usage to see if the SSL load was too much for our server CPUs. However on one of the modern 24 core machines running HAProxy I noticed top was only reporting around 100% CPU usage, with both the user and system CPU distributed pretty evenly across all the cores (4-8% user per core, 0.5-2% system). The idle percentage was in the high nineties, both as reported by top and by the haproxy socket Idle_pct. This was just a quick gathering of info and may not be representative, since our prometheus node exporter only shows overall CPU (which was a low 5% of the total on all cores throughout). This is for a bare metal server which is just running a HAProxy processing around 200 SSL req/sec, and not doing much else. I started wondering if our global settings: master-worker nbthread 24 cpu-map auto:1/1-24 0-23 tune.ssl.cachesize 100000 were appropriate or if they had caused some inefficiency in using our machine's cores, which then caused this backlog. Or whether what I am observing is completely normal, given that we are now spending more time on SSL decoding so can expect more queuing (our backend servers are very fast and so we run them with a small maxconn, but they don't care if the request is SSL or not so the overall request time should be the same other than SSL processing time). We are running either the latest OpenSSL 1.1.1 or WolfSSL, all compiled sensibly (AES-NI etc). I turned to https://docs.haproxy.org/2.9/management.html#7 which had some very interesting advice about pinning haproxy to one CPU core and the interrupts to another one, but it also mentioned nbproc and the bind process option for better SSL traffic processing. Given that seems to be a bit out of date, I thought I might ask my question here instead. Is there a way to use the CPU cores available on our HAProxy machines to handle SSL requests better than I have with the global config above? I realise this is a bit of an open ended question, but for example I was wondering if we could reduce the number of active sessions (so we don't hit maxconn) by increasing threads beyond the number of CPU cores, it naively seems that might increase per session latency but increase overall throughput since we don't appear to be taxing any of the cores (and have lots of memory available on these machines). As I said I am not even sure there is a problem, but I would like to understand a bit better if there is anything we can do to help HAProxy use the CPU cores more effectively, since all the advice I can find is obsolete (nbproc etc) and it is quite hard to experiment when I don't know what is good to measure. Thanks for your time, Miles

