Hi,
HAProxy 3.2-dev8 was released on 2025/03/21. It added 119 new commits
after version 3.2-dev7.
As mentioned in the 3.1.6 announcement, a few bugs were addressed, but
nothing critical.
For the new stuff:
- automatic CPU binding (formerly known as "NUMA patches"): this work that
started almost two years ago and which I hoped to see merged into each
version since 2.9 was finally completed! This extends the current CPU
topology detection to better bind threads and thread groups. First, by
default, nothing will change in 3.2 compared to previous versions. The
new features will consist in detecting the detailed CPU topology, hence
nodes, packages, CCX, L3 caches, cores, clusters, threads, etc and do
the best to optimally bind to them and arrange the groups to limit the
costly inter-CCX communications. It comes with a "cpu-set" directive
that allows to only bind to, or exclude, certain CPUs based on their
node/core/thread/cluster number. For example if one wants to only bind
to odd or even threads to leave the other ones for the NIC drivers,
it is trivial to do with a single directive. Second, another directive,
"cpu-policy", describes how to use the selected CPUs. The default one,
"first-usable-node", does exactly like today, i.e. it will only bind
to the first node with available CPUs and limit itself to a single
group and 64 threads max. Another policy is "group-by-cluster", it
will create one thread group per CCX/L3 cache and configure as many
threads as there are enabled CPUs on them. It can also create multiple
groups if there are more than 64 CPUs in one of them. It's possible
that it will be come the default policy starting with 3.3, as it can
use the full machine in an efficient way. Just using this one was
sufficient to multiply the performance by 3 on a 64-core EPYC, i.e.
it was the same as what can be achieved using precise "cpu-map"
directives which become quite difficult to use with many-core systems.
A few other policies are available for CPUs with P+E cores to prefer
"Performance" cores or "Efficiency" cores.
We're interested in feedback from those dealing with large systems,
particularly multi-socket ones, as well as VMs and containers, to
make sure we haven't missed anything. Many tests were run on about
20-25 different systems, as well as emulations of about 10 other
ones based on /sys captures. For those who prefer, I have created
a discussion here on GitHub, feel free to participate and share
feedback (successes, failures and suggestions):
https://github.com/orgs/haproxy/discussions/2901
- Prometheus and stats convergence: those using Prometheus probably
noticed it from time to time, it's difficult to keep the two
synchronized, so sometimes we add some new stats and forget to
do the same to Prometheus. Some changes were made to extend the
stats internal representation so that Prometheus can rely on this.
This way there is now a single place to declare new metrics that
should be exposed at the two places. If well done, it should not
change anything (actually the only thing is that the warnings
counter will finally be exported by Prometheus). Please give it
a try to confirm that everything runs as smoothly as expected.
- the log-forward sections now support an "option host" to decide
how to fill the host part of outgoing log messages (leave it as-is,
replace it, append), since different users expect different behaviors.
- some new converters are provided to support JWS signing and verify
JSON Web Token (JWT). Please just bear with me, I have zero idea
about what JWS means nor what it's used for, but there are info in
the doc about it :-) Apparently it's related to authentication.
- some changes were made to the internal representation of certificates
that are not expected to have any visible effect. If you're using
complex setups, please give it a quick try to verify that you don't
face any error at load time.
- the "wait ... srv-removable" CLI command was optimised so that it
consumes much less CPU while waiting for a server to be removable.
It used to force thread isolation during the check but thanks to
some recent changes this is no longer necessary, so those with
many servers being constantly added and removed at run time and
who used to notice CPU spikes when a whole farm went down will see
a significant improvement.
- a small "show pools detailed" CLI command will now show all pools
registered behind a single entry. That's useless for normal users
but developers might ask about this in the future when chasing a
memory error.
- we found a case on a 128-thread EPYC where some watchdog warnings
could be emitted from time to time under extreme contention on the
mt_lists, indicating that some CPUs were blocked for at least 100ms.
We found it was caused by the high margin in the exponential back-off
which seems too high for these CPUs, so we shortened it. If you had
faced warnings in the past, we're interesting in knowing if they
disappeared. If you observe a higher CPU usage, we're interested as
well (this shouldn't be the case based on our tests).
- The Lua's AppletTCP:receive() now supports an optional timeout,
making it easier to write interactive utilities supporting a
periodic refresh (think about a "top" equivalent for example).
For the record, this allowed to write a dirty "tetris" game that
works as an applet. I have not committed it yet because it needs
some polishing but it illustrates some possibilities and showed
us some limitations and even two bugs. We hope to address such
small limitations before 3.2-final, so that they ease the writing
of convenient utilities, including sniffers, proxies etc, not just
arcade games ;-)
The rest is a few cleanups and doc updates.
I'm really insisting that sensitive changes are merged before dev9, that
is due for first week of April. Past this point we'll declare the feature
freeze which as usual will mainly mean "no more big change", so that we
can spend the rest of the time finishing what's already started and
polishing/fixing what's already merged. I know that there are some SSL
infrastructure updates in the pipe, and a rework of leastconn to address
the scalability issues on large systems.
We've identified a number of small cleanups that are worth doing before
3.2-final (e.g. minor changes to Lua mentioned above, merge of h2+h3
header validation etc). Also the doc updates (namely the resolvers
with init_addr that Lukas & Luke worked on) need to be decided on and
merged.
Overall I'm starting to like what 3.2 is becoming. It could also be the
moment to think about the more intense changes to perform in 3.3 (e.g.
if we need to anticipate deprecation warnings it's not too late), and
sometimes doing some preparatory work before the release eases the
backport of fixes later. Next week I'll be quite busy so maybe not
always available to respond to discussions but do not hesitate to share
anything you might have in mind ;-)
Ah and please if you have not yet started to play with 3.2-dev, really,
give it a try *NOW*. There's still time to fix issues, rename options
etc, and it's in good shape, close to what 3.2-final should be. And if
you're lucky you might even notice improvements which will make you want
to stick to it.
Please find the usual URLs below :
Site index : https://www.haproxy.org/
Documentation : https://docs.haproxy.org/
Wiki : https://github.com/haproxy/wiki/wiki
Discourse : https://discourse.haproxy.org/
Slack channel : https://slack.haproxy.org/
Issue tracker : https://github.com/haproxy/haproxy/issues
Sources : https://www.haproxy.org/download/3.2/src/
Git repository : https://git.haproxy.org/git/haproxy.git/
Git Web browsing : https://git.haproxy.org/?p=haproxy.git
Changelog : https://www.haproxy.org/download/3.2/src/CHANGELOG
Dataplane API :
https://github.com/haproxytech/dataplaneapi/releases/latest
Pending bugs : https://www.haproxy.org/l/pending-bugs
Reviewed bugs : https://www.haproxy.org/l/reviewed-bugs
Code reports : https://www.haproxy.org/l/code-reports
Latest builds : https://www.haproxy.org/l/dev-packages
Willy
---
Complete changelog :
Amaury Denoyelle (2):
BUG/MEDIUM: mux-quic: fix crash on RS/SS emission if already close local
BUG/MINOR: mux-quic: remove extra BUG_ON() in _qcc_send_stream()
Aurelien DARRAGON (29):
CLEANUP: log-forward: remove useless options2 init
CLEANUP: log: add syslog_process_message() helper
MINOR: proxy: add proxy->options3
MINOR: log: migrate log-forward options from proxy->options2 to options3
MINOR: log: provide source address information in syslog_process_message()
MINOR: tools: only print address in sa2str() when port == -1
MINOR: log: add "option host" log-forward option
MINOR: log: handle log-forward "option host"
MEDIUM: log: change default "host" strategy for log-forward section
DOC: management: rename some last occurences from domain "dns" to
"resolvers"
BUG/MINOR: stats: fix capabilities and hide settings for some generic
metrics
BUG/MINOR: log: prevent saddr NULL deref in syslog_io_handler()
BUG/MINOR: hlua: fix optional timeout argument index for
AppletTCP:receive()
BUG/MEDIUM: hlua/cli: fix cli applet UAF in hlua_applet_wakeup()
MINOR: stats: add .generic explicit field in stat_col struct
MINOR: stats: STATS_PX_CAP___B_ macro
MINOR: stats: add .cap for some static metrics
MINOR: stats: use stat_col storage stat_cols_info
MEDIUM: promex: switch to using stat_cols_info for global metrics
MINOR: promex: expose ST_I_INF_WARNINGS (AKA total_warnings) metric
MEDIUM: promex: switch to using stat_cols_px for front/back/server metrics
MINOR: stats: explicitly add frontend cap for ST_I_PX_REQ_TOT
CLEANUP: promex: remove unused PROMEX_FL_{INFO,FRONT,BACK,LI,SRV} flags
MINOR: stats: add alt_name field to stat_col struct
MINOR: stats: add alt name info to stat_cols_info where relevant
MINOR: promex: get rid of promex_global_metric array
MINOR: stats-proxy: add alt_name field for ME_NEW_{FE,BE,PX} helpers
MINOR: stats-proxy: add alt name info to stat_cols_px where relevant
MINOR: promex: get rid of promex_st_metrics array
Christopher Faulet (1):
BUG/MINOR: mux-h2: Reset streams with NO_ERROR code if full response was
already sent
Olivier Houchard (1):
MEDIUM: mt_list: Reduce the max number of loops with exponential backoff
Valentine Krasnobaeva (3):
MINOR: cpu-topo: fix unused stack var 'cpu2' reported by coverity
BUG/MINOR: limits: compute_ideal_maxconn: don't cap remain if
fd_hard_limit=0
MINOR: limits: fix check_if_maxsock_permitted description
William Lallemand (7):
MINOR: jws: implement JWS signing
TESTS: jws: implement a test for JWS signing
CI: github: add "jose" to apt dependencies
MINOR: jws: add new functions in jws.h
MINOR: jws: use jwt_alg type instead of a char
MINOR: tools: path_base() concatenates a path with a base path
MEDIUM: ssl/ckch: make the ckch_conf more generic
Willy Tarreau (76):
BUG/MEDIUM: thread: use pthread_self() not ha_pthread[tid] in set_affinity
MINOR: compiler: add a simple macro to concatenate resolved strings
MINOR: compiler: add a new __decl_thread_var() macro to declare local
variables
BUILD: tools: silence a build warning when USE_THREAD=0
BUILD: backend: silence a build warning when threads are disabled
MINOR: cli: export cli_io_handler() to ease symbol resolution
MINOR: tools: improve symbol resolution without dl_addr
MINOR: tools: ease the declaration of known symbols in resolve_sym_name()
MINOR: tools: teach resolve_sym_name() a few more common symbols
BUILD: tools: avoid a build warning on gcc-4.8 in resolve_sym_name()
DEV: ncpu: also emulate sysconf() for _SC_NPROCESSORS_*
DOC: design-thoughts: commit numa-auto.txt
MINOR: cpuset: make the API support negative CPU IDs
MINOR: thread: rely on the cpuset functions to count bound CPUs
MINOR: cpu-topo: add ha_cpu_topo definition
MINOR: cpu-topo: allocate and initialize the ha_cpu_topo array.
MINOR: cpu-topo: rely on _SC_NPROCESSORS_CONF to trim maxcpus
MINOR: cpu-topo: add a function to dump CPU topology
MINOR: cpu-topo: update CPU topology from excluded CPUs at boot
REORG: cpu-topo: move bound cpu detection from cpuset to cpu-topo
MINOR: cpu-topo: add detection of online CPUs on Linux
MINOR: cpu-topo: add detection of online CPUs on FreeBSD
MINOR: cpu-topo: try to detect offline cpus at boot
MINOR: cpu-topo: add CPU topology detection for linux
MINOR: cpu-topo: also store the sibling ID with SMT
MINOR: cpu-topo: add NUMA node identification to CPUs on Linux
MINOR: cpu-topo: add NUMA node identification to CPUs on FreeBSD
MINOR: thread: turn thread_cpu_mask_forced() into an init-time variable
MINOR: cfgparse: move the binding detection into numa_detect_topology()
MINOR: cfgparse: use already known offline CPU information
MINOR: global: add a command-line option to enable CPU binding debugging
MINOR: cpu-topo: add a new "cpu-set" global directive to choose cpus
MINOR: cpu-topo: add "drop-cpu" and "only-cpu" to cpu-set
MEDIUM: thread: start to detect thread groups and threads min/max
MEDIUM: cpu-topo: make sure to properly assign CPUs to threads as a
fallback
MEDIUM: thread: reimplement first numa node detection
MEDIUM: cfgparse: remove now unused numa & thread-count detection
MINOR: cpu-topo: refine cpu dump output to better show kept/dropped CPUs
MINOR: cpu-topo: fall back to nominal_perf and scaling_max_freq for the
capacity
MINOR: cpu-topo: use cpufreq before acpi cppc
MINOR: cpu-topo: boost the capacity of performance cores with cpufreq
MINOR: cpu-topo: skip CPU detection when /sys/.../cpu does not exist
MINOR: cpu-topo: skip identification of non-existing CPUs
MINOR: cpu-topo: skip CPU properties that we've verified do not exist
MINOR: cpu-topo: implement a sorting mechanism for CPU index
MINOR: cpu-topo: implement a sorting mechanism by CPU locality
MINOR: cpu-topo: implement a CPU sorting mechanism by cluster ID
MINOR: cpu-topo: ignore single-core clusters
MINOR: cpu-topo: assign clusters to cores without and renumber them
MINOR: cpu-topo: make sure we don't leave unassigned IDs in the cpu_topo
MINOR: cpu-topo: assign an L3 cache if more than 2 L2 instances
MINOR: cpu-topo: renumber cores to avoid holes and make them contiguous
MINOR: cpu-topo: add a function to sort by cluster+capacity
MINOR: cpu-topo: consider capacity when forming clusters
MINOR: cpu-topo: create an array of the clusters
MINOR: cpu-topo: ignore excess of too small clusters
MINOR: cpu-topo: add "only-node" and "drop-node" to cpu-set
MINOR: cpu-topo: add "only-thread" and "drop-thread" to cpu-set
MINOR: cpu-topo: add "only-core" and "drop-core" to cpu-set
MINOR: cpu-topo: add "only-cluster" and "drop-cluster" to cpu-set
MINOR: cpu-topo: add a CPU policy setting to the global section
MINOR: cpu-topo: add a 'first-usable-node' cpu policy
MEDIUM: cpu-topo: use the "first-usable-node" cpu-policy by default
CLEANUP: thread: now remove the temporary CPU node binding code
MINOR: cpu-topo: add cpu-policy "group-by-cluster"
MEDIUM: cpu-topo: let the "group-by-cluster" split groups
MINOR: cpu-topo: add a new "performance" cpu-policy
MINOR: cpu-topo: add a new "efficiency" cpu-policy
MINOR: cpu-topo: add a new "resource" cpu-policy
MINOR: hlua: add an optional timeout to AppletTCP:receive()
MINOR: stream: decrement srv->served after detaching from the list
MINOR: server: simplify srv_has_streams()
CLEANUP: server: make it clear that srv_check_for_deletion() is
thread-safe
MINOR: cli/server: don't take thread isolation to check for srv-removable
MINOR: pools: rename the "by_what" field of the show pools context to
"how"
MINOR: cli/pools: record the list of pool registrations even when merging
them
---