Hi,
HAProxy 3.0.0 was released on 2024/05/29. It added 21 new commits
after version 3.0-dev13. I do appreciate that everything was only
cosmetic.
We're having a total of 1108 patches in this release among which 850 ones
not concerning a bug, which makes it the smallest LTS release of all times
(2.6 and 2.4 still remain the largest ones, respectively 65% and 58%
larger). This is a good news in terms of expected stability, which might
possibly break the old myth of "better avoid dot zero".
Let's try to summarize what's new in this release. It has been one of the
most difficult for me to summarize because I'm not seeing one big killer
feature, instead it's an LTS as we like them: mostly a nice polishing of
existing stuff and small improvements all over the place as permitted by
the previous version's architectural changes. I tried to classify this
into a few categories, depending on the intended benefits.
First, let's enumerate the new features, and improvements of existing ones:
- stats can finally be preserved across reloads for frontends,
listeners, backends and servers. When using this, the config objects
of the new process are preloaded with the relevant values from a dump
of the previous process. This essentially concerns counters, ages and
rates. Please have a look at "stats-file" and "dump stats-file" for
more information.
- the log outgoing load-balancing now relies on a regular backend,
meaning that the load balancing algorithms could finally be unified
with the ones used by other protocols, and servers now support
weights.
- log-format now supports JSON and CBOR output encoding. In such a case,
the field name is taken from a new naming scheme that is placed within
the log-format itself, allowing to assign a name to each field.
- the load balancing algorithm "sticky" that was initially reserved for
logs was generalized to other protocols.
- the HTTP/2 RST_STREAM reason code can finally be forwarded to the
server for client aborts. This addresses the problem a few users were
facing with gRPC where request cancellation appeared as communication
errors the server side. For now this is purposely limited to only a
few reason codes that are relevant to gRPC so that we don't ruin the
possibility to later extend that to H3 and maybe H1.
- QUIC now supports the HyStart++ (RFC9406) alternative to slowstart
with the Cubic algorithm. It's supposed to show better recovery
patterns. It's not yet enabled by default.
- a new set of converters, map_*_key, will report the matching part of
the key itself instead of the associated pattern. The main target use
cases for this is to know what address mask an address did match, or
what regex a pattern did match.
- the "uuid()" sample fetch function, which takes an optional version in
argument now also supports "7" for UUIDv7. These UUIDs regroup many
properties found in ULID and other mechanisms, one of the most
interesting one being time-based locality that, for example, eases the
archiving of old data, or the grouping of events on systems where
they'll be processed together.
- the name associated with servers in connection pools can now be
overridden by the expression in "pool-conn-name" when SNI is not
desired (useful with rhttp without SSL for example, but may also make
sense when reaching remote servers over SSL tunnels). It also allows
to entirely drop SSL from the server.
- the "namespace" argument now works for "bind" and "server" lines using
UNIX sockets.
- Linux capabilities: the use of namespaces on the server side used to
require capability "cap_sys_admin" but it was neither checked nor
reported on startup to it would silently fail. The capability is now
supported and is being checked for. Similarly, the need for
capabilities for transparent proxying or QUIC are checked and reported
on startup. Finally, file-system capabilities set on the executable are
also supported now.
- the set-mark/set-tos actions were extended to support an expression in
addition of the constant, and were extended to also support the backend
side. This can for example be used to select an outgoing link from a
single IP address. The new backend actions are called "set-bc-mark" and
"set-bc-tos", and by analogy new frontend actions called "set-fc-mark"
and "set-fc-tos" were created, and the old actions are aliases of these
last ones.
- QUIC built with latest AWS-LC TLS library now correctly supports 0-RTT.
- a new global setting "ssl-security-level" allows to adjust OpenSSL's
internal security level beween 0 and 5. Previously it could only be
done in openssl.cnf.
- the key used by consistent hash to map to a server used to always be
the server's id (either explicit or implicit, position-based), but
that was not always convenient when dealing with fast added-removed
server within a large fleet of LBs. Now the "hash-key" directive will
also allow to use the server's address or address+port for this so
that the same key ends up on the same server for all LBs.
- The HTTP client now has an option to use either origin or absolute
URIs. This should make it easier to configure it to talk to old
servers which are not spec-compliant and do not support absolute
URIs. The ocsp_update agent already exploits this ability via a new
setting "ocsp-update.httpproxy".
- it is now possible to suppress Content-Length and Transfer-Encoding
headers from HTTP/1 requests and responses. It must never be done of
course but there are rare situations where users dealing with bogus
clients or server need to perform such cleanups. Most of the time
when done, this will mark a connection non-reusable and it will be
closed at the end of the transfer.
- the proxy protocol now also parses TLV for LOCAL mode and supports
sending them without a stream so that elements can be passed during
the preconnect phase of a reverse-HTTP instance to a next stage that
will no longer ignore them.
- the new sched_setaffinity() of FreeBSD 14 and newer is now supported.
- the new certificate selection callback for WolfSSL was now enabled
since it's finally available in the upstream project.
Second, there were a reasonable set of usability improvements, all the
small features that make config management and day-to-day operations
easier:
- maps are often used to operate at run time on some parts of the
configuration. When no initial value is desired, it was still needed
to have an empty file (/dev/null is not usable since a map is indexed
by its name). As such, some users have expressed their desire to have
virtual and/or optional maps. Both are brought by this version. When
a map is loaded from a file whose name begins with "opt@", the file
will only be loaded if it exists otherwise an empty map will be created
with this name. And maps whose name begins with "virt@" are exclusively
virtual and never backed by a file. They're always created empty at
boot, for use at run time.
- the default certificate selection method was improved: till now, the
default certificate was the first one mentioned on the bind line. This
causes issues with sites that want to support both RSA and ECDSA. A
new approach was brought, with an optional "default-crt" keyword that
designates the default certs on the bind line, and its equivalent in
the crt-list files designated by "*" in the name. This allows the right
cert to be picked based on the desired algorithm. Of course the default
behavior doesn't change.
- the list of status codes that increment the http_err_cnt and
http_fail_cnt counters can now be changed with the global directives
"http-err-codes" and "http-fail-codes". This has long been requested,
both by those whose applications randomly return 500 that are not
server failures, and those where 404 happen a lot and does not
necessarily indicate a URL scanner. All of the 1xx-5xx range is
permitted for both classes.
- cookies, both static and dynamic, are now permitted for dynamically
added servers.
- API clients will find the CLI more friendly when it comes to removing
a server. First, idle connections are now automatically closed when
trying to delete a server, so that it's no longer needed to wait for
them to vanish. Second, a new "wait" command pauses operations for at
most as long as specified, optionally waiting for a condition. A new
such condition is "srv-removable", which checks when a server may
safely be removed. This means that issuing this "wait" command before
a "del server" command will save the client from having to
periodically retry the operation.
- a new "crt-store" configuration section is supported. It allows to
declare certificates by specifying the path for each element. The aim
is essentially to decorellate the storage from the instantiation, both
of which are currently correlated in crt-lists, and to allow easier
specification of individual components. This section supports
"crt-base" and "key-base" to ease the splitting of certificates and
keys into distinct directories, as well as "ocsp-update" to indicate
which certificates need to have their OCSP partperiodically updated.
The certificates also support aliases so that they can be referenced
from a bind line with a more convenient names than a file name.
crt-lists may now make use of these certificates to only decide which
ones to instantiate for a given listener, without having to deal with
deployment concerns such as paths and file names.
- the "thread-hard-limit" global parameter was added. It allows to only
set a hard limit on the number of threads without enforcing that value
as the thread count (like nbthread does). This is convenient to
prepare portable configs with no more than X threads when one knows
it's only a waste of resources to use more.
- certain warnings about the presence of HTTP rules in TCP frontends
that are going to be upgraded to HTTP when switching to a backend will
now no longer be reported when it is certain that they will work as
expected.
- a new "guid" keyword was added for servers, listeners and proxies.
The purpose will be to make it possible for external APIs to assign a
globally unique object identifier to each of them in stats dumps or
CLI accesses, and to later reliably recognize a server upon reloads.
One usage example right now is stats preservation across reloads where
this GUID uniquely identifies a server between two configs.
- it has become easier to pass extra CFLAGS / LDFLAGS to the Makefile,
just pass them into these variables (and a few other ones). Many were
removed as the result of the simplification. The removed ones will
trigger a build warning indicating what to use instead. A warning will
also be emitted when passing an unknown USE_* setting, and such
settings now support to be set to zero to disable them.
In addition to this, some changes aim at improving the reliability:
- the draining of HTTP/1 request body was finally implemented. It is
needed when an early response is sent before the end of a POST
request, typically due to a redirect or authentication issue. It used
to cause difficulties due to the TCP stack emitting an RST that would
sometimes destroy the response before it had a chance to be sent, but
this is now something of the past.
- the buffer allocator's behavior on out-of-memory condition was finally
fixed. It had been flaky since version 1.7, with possibilities for all
requesters to deadlock if none had enough room to complete their work.
A new, more robust algorithm was finally implemented, making sure that
at least one requester has enough resources to make forward progress
and let the system recover by itself.
Other ones put a particular focus on robustness against various threats in
general:
- H2, H3 and QUIC now maintain a counter of per-connection glitches,
which are characterized by not strictly illegal but suspicious or
bogus protocol handling and behavior from a peer. Such counters are
reported at upper layers, are trackable in stick-tables, and can be
used to kill a misbehaving connection past a threshold. The goal here
is to significantly reduce the CPU impact and log pollution caused by
bots that blindly try to exploit various well-known vulnerabilities or
limitations of some implementations. Since this works on both sides it
can also be used to detect faulty applications that would need to be
fixed.
- H2 now supports to forcefully close connections after a configurable
number of streams. This can be used to accelerate the switchover during
reloads, as well as maintain an optimal balance between multiple front
nodes, and force the re-evaluation of sanity checks at the connection
level regarding tracked metrics to more easily get rid of abusers.
- two new global settings now make it possible to simply prevent HAProxy
from accepting traffic from privileged ports; one setting is for TCP
and the other one for QUIC. QUIC was configured by default to refuse
such traffic, because by relying on UDP it's particularly exposed to
DNS and NTP amplification attacks, and while it's more efficient to
filter such ports upstream, it's still very simple and cheap to just
drop such undesirable packets before processing them.
- the code no longer depends on libsystemd, so that we will not pull in
a myriad of questionable dependencies anymore. This also allows to
enable USE_SYSTEMD by default (it's only done on linux-glibc though),
thus reducing configuration combinations.
As with every version comes a comprehensive collection of performance
improvements:
- quic: the fast-forwarding mechanism now considers the flow control
state, resulting in a reduction of the number of wakeups and better
filling of packets. The internal send API was reworked and simplified
and one buffer copy could be removed. Some minor fixes and cleanups
were done in the cubic congestion controller.
- a new QUIC setting, "tune.quic.reorder-ratio" was added to let the
user adjust the size of holes over the in-flight window before we
declare a loss. Normally QUIC users should observe much better
performance now, even with the default setting (50%), which was
sufficient for us to observe x10-20 at 3% losses. The send path was
improved and cleaned up, by using exclusively sendmsg() and avoiding
some copies where possible. Some CPU savings are expected on intense
workloads.
- the H1 mux now also supports zero-copy forwarding for chunks of unknown
size (i.e. those larger than a buffer).
- the fast forward zero-copy mechanism is now supported by applets. This
will ultimately result in lower memory usage and higher performance
for some applets such as the cache by carefully avoiding to queue more
data than the mux can take without buffering. This can still be
disabled by unsetting tune.cache.zero-copy-forwarding.
- a few ebtree backports improved the performance on non-x86 machines
(typically ~2% faster string lookups were measured on ARM and ~3%
task switching rate was measured).
- some of the remaining server name lookups that were still linear moved
to use the tree instead, speeding up certain operations or config
parsing.
- ring: the ring internal API used to represent a bottleneck for traces
at TCP logs, especially on multi-threaded systems due to the initially
unplanned locking that resulted from the underlying buffer API. All of
this was entirely rewritten so that the code is almost lockfree and
waiting threads can prepare their work as groups in parallel. The
performance increased by a factor of 2.5 on NUMA systems and even by
20 on uniform systems, reaching up to around 7 million messages per
second. This is sufficient to enable traces at the "developer" level
even on moderately loaded systems. The "haring" utility was updated to
automatically detect the new, slightly different format and support
both the old and the new ones (the old haring tool will still read the
new format in repair mode).
- stick-tables are now sharded over multiple tree heads each with their
own locks. This significantly reduces locking contention on systems
with many threads (gains of ~6x measured on a 80-thread systems). In
addition, the locking could be reduced even with low thread counts,
particulary when using peers, where the performance could be doubled.
This is particularly noticeable when using the bandwidth limiting
filter "bwlim".
- The Lua latency with single-threaded scripts (loaded by "lua-load")
running on multi-thread instances was improved a lot by reducing the
amount of consecutive instructions a thread may run when there are
many threads.
A few changes that improve observability:
- a few more sample fetches corresponding to certain log-format aliases
were added (txn.redispatched, bc_be_queue, bc_srv_queue, etc).
- new sample fetch functions retrieve the number of concurrent streams
over the same connection for a frontend or a backend, as well as the
maximum number negotiated. This can be useful to sort out connection
performance from stream performance when looking at timings in logs.
- the Prometheus exporter now exposes a bunch of new metrics (resolvers,
more server stuff) and supports applying filters to limit the metrics
that have to be returned.
Some debugging aid to save experts time in field, speed up recovery and
reduce the number of round trips in issues:
- stick-table operations over the CLI using commands like "show table",
"set table" and "clear table" now supports a "ptr" argument to directly
use the pointer retrieved from a previous "show" command. This is
convenient to remove bogus entries manually for example.
- haproxy -dD will now report suspicious ACL pattern values which look
like known ACL/sample fetch keywords.
- the "insecure-fork-wanted" option now has an equivalent on the command
line, "-dI". It's convenient to obtain decoded ASAN outputs for
example, without having to edit a config
- QUIC and HTTP/3 added some traces, refined some error reporting, and
improved the accuracy of the "show quic" output.
- the backend equivalent of the frontend keylog mechanism was
implemented, so that it is now possible to decipher TLS captures on
the backend side. The log-format to be used becomes a bit large,
please refer to the example in the doc.
- some internal large memory areas (file descriptor tables, HTTP and SSL
session caches, ring buffers etc) now have a name that is visible on
Linux >= 5.17 in /proc/$pid/maps or using pmap. This will help figure
out where the memory is being used and why.
- traces are way faster on multi-threaded systems thanks to the ring
locking changes, making them usable without risks on moderately loaded
systems.
Some possibly (but unlikely) breaking changes:
- an update of the DeviceAtlas addon was made to support the new version
of the library. It slightly changes the build system but so far no issue
was reported.
- a mistake I accidentally introduced two years ago with a bug fix had
the undesired side effect of randomly accepting chained commands on
the CLI in non-interactive mode, when delimited by line feeds. The
likelihood that it would work is essentially time-based, so a short
string of multiple commands had great chances of working while a large
one almost none. This started to cause side effects to other issues and
had to be fixed, so that we no longer accept multiple commands delimited
by '\n' in non-interactive mode, as documented. If you happen to have
such scripts sending multiple commands this way, you may have to fix
them (either use the semi-colon ';' to delimit the commands, or switch
to interactive mode via the "prompt" command). A warning is emitted when
this unreliable behavior is detected, to ease detection of faulty
scripts.
- the "enabled" server keyword used to be silently ignored when adding a
dynamic server. Now it's properly rejected to avoid confusing scripts.
- the way the memory limitation specified by "-m" on the command line
was handled on Linux using RLIMIT_AS got completely useless over time
due to much more fragmented memory spaces on 64-bit platforms, ASLR,
and the fact that it had been chosen exclusively to avoid
underestimating the allocated buffers' cost, which originally were
allocated all the time even when empty. Nowadays this is no longer
relevant since buffers are only allocated when used, and the current
state had the nasty effect of causing OOMs way below the configured
limit, rendering it pretty useless. The use of RLIMIT_AS was now
dropped in favor of the more reliable RLIMIT_DATA like on other
operating systems.
- the "namespace" keyword used to be silently ignored on "bind" and
"server" lines using UNIX sockets. Now it is properly used and
checked, thus it may fail if it references an invalid value. If the
previous configuration used to work, it probably means the keyword was
not needed. In addition, the presence of the keyword on a "server"
line may also cause a boot failure that was previously only detected
at run time, if permissions are insufficient. There's no loss of
functionality here, only a check performed earlier to ensure the
process boots in a properly working state.
- the HTTP/1 URI parser no longer accepts invalid origin-form URIs that
start neither with a '/' nor a '*' (e.g. "index.html" without leading
slash). Even if some servers would still accept that, clients that
would be compatible with this have disappeared way more than a decade
ago, and continuing to support this for such broken applications would
probably lead to an abuse sooner or later, so better put an end to
this now.
- a workaround for an issue affecting QUIC on LibreSSL when running on
non-x86 machines was developed jointly with the LibreSSL team. There's
an issue with the CHACHA20_POLY1305 cipher when used in-place (for
QUIC) that has been well identified and will be fixed in version 4.0
of LibreSSL. The workaround consists in making the QUIC connection
fail fast so that the client can quickly retry using TCP. We'll
disable it once a stable LibreSSL version is out with the fix. A
config-based workaround consists in forcing the ciphers, and exclude
this one.
And we even found some room to improve the code's maintainability and
clarity, which will hopefully further lower the barrier to contribution:
- applet: most of the internal API rework was done, which simpifies the
upper layers and the applet code as well (for those that were
converted). New applet code will have its own buffers and even less
stuff to care about. This is also true for the CLI keyword handlers
which can now be written in a more natural way and may now yield even
when not blocked.
- a significant part of the internal "shutdown" API was cleaned up so
that there is now only one function at each layer instead of one per
direction. Not only this did eliminate very old legacy code ported
over the years, it also made it possible to forward gRPC
cancellations.
- prometheus: a new registration mechanism was added to permit to
register metrics per module (e.g. stick-tables, resolvers etc). The
extra counters are also dumped if requested now (frontend, backend,
listener, server).
I'm fairly certain that I forgot a few things. As usual, I'm told that my
coworkers at HAProxyTech also went through this tedious task of enumerating
the changes, and it will be posted soon here:
https://www.haproxy.com/blog/announcing-haproxy-3-0
My understanding is that there will be some followups with a focus on
selected points. I'm not surprised by the difficulty of the exercise
this time ;-)
For this version, we've got an increased help from various testers who
accepted to run one (or a few) servers with the development version, and
who were able to report a few problems with accurate version ranges, as
well as traces and info that permitted to fix the issues quickly. It
worked amazingly well and allowed us to address some nasty bugs that are
fairly hard to reproduce and that were present for several versions
already. At the risk of repeating myself, thanks for that! I know that
operating a -dev version requires a bit more involvement than a stable one
but it's also a win-win: when something doesn't please you, it's not too
late to suggest a change, and you can benefit from the latest debugging
features and performance improvements. I sincerely hope that this success
will encourage other users into that direction. The nice benefit for the
user of facing a bug in -dev vs -stable is that we have no problem
developing new debugging extensions just for that issue, so a git pull is
enough to suddenly make the problem much more observable and require less
amount of work to filter data than with a stable version. And something
that's human is that developers tend to be much more attracted by issues
affecting areas that are still fresh in their heads and will tend to treat
them with higher priority.
I also noticed more exchanges from various participants on the issues
and here on the list, so big thanks as well to those who take time to
review other users' problem reports and requests for help. Especially
for first-time reporters, it gives them a great experience of the
project and its community.
As usual with a new major release comes the death of an old one. This time
it's 2.0 that passed away after 5 years serving as a transition between
the old legacy versions and the newer HTX-enabled ones. I'm fairly sure
there are still some here and there, so please consider this as a reminder
that it's about time to upgrade. And 2.4 turned to critical fixes only
status.
On a side note (not very funny but surprising), apparently there was a big
GitHub outage last night, and this morning we're getting a "Ooops 500" page
on the haproxy repository there: https://github.com/haproxy/haproxy
The issues seem to be working, the wiki and docs projects as well. So I
suspect that an error page got cached during the outage and continues to
be delivered for whetever reason. I opened a ticket to their support and
we'll see when we get a response. Fortunately we're not completely blocked,
but it feels strange to release on a day of outage. After all, that's a
form of resilience that also makes one use a load balancer, so there's
some logic there.
Speaking of resilience, I'm going to take a bit of vacation next week and
the week after (maybe I should have postponed given the heavy rain here),
but you're in good hands with the rest of the team, and Christopher is
back on Monday, fresh an in full force. Maybe you'll even manage to
convince him to emit -dev1 himself, who knows :-)
Please find the usual URLs below :
Site index : https://www.haproxy.org/
Documentation : https://docs.haproxy.org/
Wiki : https://github.com/haproxy/wiki/wiki
Discourse : https://discourse.haproxy.org/
Slack channel : https://slack.haproxy.org/
Issue tracker : https://github.com/haproxy/haproxy/issues
Sources : https://www.haproxy.org/download/3.0/src/
Git repository : https://git.haproxy.org/git/haproxy-3.0.git/
Git Web browsing : https://git.haproxy.org/?p=haproxy-3.0.git
Changelog : https://www.haproxy.org/download/3.0/src/CHANGELOG
Dataplane API :
https://github.com/haproxytech/dataplaneapi/releases/latest
Pending bugs : https://www.haproxy.org/l/pending-bugs
Reviewed bugs : https://www.haproxy.org/l/reviewed-bugs
Code reports : https://www.haproxy.org/l/code-reports
Latest builds : https://www.haproxy.org/l/dev-packages
I verified what I had in mind for 3.0 and 3.1-dev0 (that just opened),
and I think all is good (Tim already fixed an incorrect color on the
docs index). As usual, if (should I say when?) you detect a broken link,
just let me know so I can fix it.
Have fun!
Willy
---
Complete changelog from 3.0-dev13:
Amaury Denoyelle (2):
DOC: streamline http-reuse and connection naming definition
REGTESTS: complete http-reuse test with pool-conn-name
Aurelien DARRAGON (3):
MINOR: log: rename 'log-format tag' to 'log-format alias'
DOC: config: document logformat item naming and typecasting features
DOC: config: add %ID logformat alias alternative
Valentine Krasnobaeva (3):
CLEANUP: ssl/ocsp: readable ifdef in ssl_sock_load_ocsp
BUG/MINOR: ssl/ocsp: init callback func ptr as NULL
BUG/MINOR: activity: fix Delta_calls and Delta_bytes count
William Lallemand (2):
MINOR: sample: implement the uptime sample fetch
CI: github: upgrade the WolfSSL job to 5.7.0
Willy Tarreau (11):
CI: scripts: fix build of vtest regarding option -C
CI: scripts: build vtest using multiple CPUs
BUILD: makefile: yearly reordering of objects by build time
BUILD: fd: errno is also needed without poll()
DOC: config: fix two typos "RST_STEAM" vs "RST_STREAM"
DOC: config: refer to the non-deprecated keywords in ocsp-update on/off
CLEANUP: ssl_sock: move dirty openssl-1.0.2 wrapper to openssl-compat
DOC: install: update quick build reminders with some missing options
DOC: install: update the range of tested openssl version to cover 3.3
DEV: patchbot: prepare for new version 3.1-dev
MINOR: version: mention that it's 3.0 LTS now.
---