Hi,
HAProxy 3.1.4 was released on 2025/02/19. It added 50 new commits
after version 3.1.3.
There were 11 issues tagged MEDIUM and 21 tagged MINOR, in addition to a
few improvements.
Let's start with the medium-level issues:
- in API issue in the applets could have resulted in some shutdown or
error conditions to be missed in the future, so as a prevention it
was fixed. Turns out, after fixing this, it uncovered a bug in the
CLI's "_getsocks" handler that was causing an infinite loop during
reloads, and another one in the SPOE applet where the appled would
never shut down (neither appeared in a released version), and these
bug were also fixed.
- a check for improper resizing of the trash buffers could be triggered
when tune.memory.hot-size was used and tune.bufsize increased, causing
a panic at startup. In this case the check overlooked a valid case and
was relaxed, but it allowed to identify a case that was not initially
thought about, and could have been missed.
- the shorter watchdog delay in 3.1 allows to print a warning revealed
that we could sometimes deadlock between a thread dump (e.g. as called
by a stuck warning) and a panic. That's not cool because it could end
up with a process that spins forever instead of dying.
- reloads that transfer listening sockets to the new worker process could
make the older worker consume a lot of CPU for no apparent reason for
the time it remained present. The cause was that these FDs were
registered in epoll and when a new connection arrived to the new
process, the old one would also be notified without being able to
unregister it since already closed (well-known epoll pitfall). Now
these FDs are properly unregistered after being transfered so it's
possible that some users with long-running old processes will observe
a lower CPU usage on these old processes.
- a BUG_ON() could be triggered when using filters with no http_payload
callback.
- a bug in htx_xfer_blks() could result in occasionally transfering more
blocks than requested on 32-bit platforms.
- the FCGI mux faceda similar issue as the H2 mux a while ago regarding
truncated frames, i.e. it could wait forever on a partial record when
a read shutdown was received. The same solution was applied as for H2.
- some TLSv1.3 signature algorithms were not recognized by the
ClientHello parser which was written before TLSv1.3. The ones that
were not correctly supported were based on RSA-PSS and would have
resulted in presenting a possibly wrong certificate when both RSA and
ECDSA ones were present for the same SNI.
The smaller ones:
- a few minor memory leaks were found in error paths (auth, _getsock,
flt-trace)
- only one "users" option in userlist "group" directive is supported,
but extraneous ones were still accepted and silently leaked, which
is no longer the case (an alert is now displayed when "users" is
repeated).
- FCGI would always force the status to 302 when seeing a Location
header, possibly overwriting another status code.
- http-checks could mistakenly add a "Content-Length: 0" to GET/HEAD/etc
requests, which was rejected by some servers. Now the header will only
be emitted when there is explicit content.
- H1 responses truncated after a chunk boundary (i.e. only missing the
0-sized chunk) forwarded to H2 could end up with a clean END_STREAM
flag instead of an RST_STREAM(CANCEL). The difference is subtle,
because the former states that the transfer was complete while the
latter says it was interrupted. In the first case, a client would
consider the object as complete (i.e. it could display a broken image)
while for the latter the client might possibly decide to try again.
- a few crashes could happen in the QUIC mux failed to initialize.
- since the mworker rework, a section declared after another section
involving a post-section parser would be silently skipped during
discovery by the master process. It's really not obvious to build a
configuration that triggers this problem and even harder to create
one that has an effect (e.g. "program" after "resolvers"), but it
could definitely cause some head scratching.
- some QUIC crypto frames could be 1 to 2 bytes smaller than permitted
by the MTU. Also, related to packet length, some packets can use a
long header, and some room could be missing in the buffer to store
their length field, resulting in errors.
- the signature algorithms were not listed on "show ssl crt-list". They
now are.
- cross-table lookups performed using sc_get_XXX(explicit_table) with
tables of different key types were lacking the proper type cast to
look up the key in the other table, generally resulting in its
equivalent one not being found (e.g. binary vs string etc).
- a pending close from the server could be forwarded to the client
despite a pending tcp-response content evaluation.
And a few improvements:
- QUIC: the "pacing" feature, which is mandatory for the BBR congestion
control and highly recommended for others, is still experimental (and
opt-in) in 3.1. Till now it would pace using too fine a granularity
(a nanosecond-based timer) that resulted in extreme CPU usage. Now
that all the required arrangements were done to make it work fine at
the millisecond level, this code was now backported to 3.1. The parts
that are changed only concern what was covered by the experimental
directive, so if you don't have "expose-experimental-directives" in
your config, you won't notice anything, and if you're already using
it and have configured burst sizes on the congestion algorithms to
enable pacing, you will notice both a slightly higher bandwidth and
a significantly decreased CPU usage. The previous pacing burst value
is now ignored and only serves as a boolean to enable the feature (so
as not to break configs). Those who were using QUIC without pacing
(due to the CPU usage) are encouraged to turn it on again by passing
a non-zero argument to the algorithm. We've observed transfer gains
up to x20 thanks to avoiding losses and letting the window grow
enough to use the link more efficiently!
- we've had (very few) reports of epoll reporting errors on some FDs,
that we suspect are caused by races between threads when an FD is
passed between threads, closed and immediately reopened by the initial
thread, which could possibly then receive a late error report for the
previous one. Switching to poll always made the problem disappear. In
order to counter this we've first added a configurable mask of events
that we want not to report so that system calls encounter them on their
own. It *looks* like it has done the job, albeit possibly not
completely. As such we've added a more advanced mechanism that
implements a version number for each FD so that we can always reliably
compare the FD in the report with the currently active one. Those who
have been facing spurious 502 on the server side may be interested in
testing again with 3.1.4 and see if the problem persists (in which
case it will void an entire class of bugs). This will progressively be
backported to older stable releases so that we don't have to deal with
long tedious debugging sessions involving this possible case that is
often suspected first these days.
And that's about all for this one I think. Some of these will be backported
to other versions soon (at least 3.0 I think). Let's switch to -dev now for
me, and for you let's update :-)
Please find the usual URLs below :
Site index : https://www.haproxy.org/
Documentation : https://docs.haproxy.org/
Wiki : https://github.com/haproxy/wiki/wiki
Discourse : https://discourse.haproxy.org/
Slack channel : https://slack.haproxy.org/
Issue tracker : https://github.com/haproxy/haproxy/issues
Sources : https://www.haproxy.org/download/3.1/src/
Git repository : https://git.haproxy.org/git/haproxy-3.1.git/
Git Web browsing : https://git.haproxy.org/?p=haproxy-3.1.git
Changelog : https://www.haproxy.org/download/3.1/src/CHANGELOG
Dataplane API :
https://github.com/haproxytech/dataplaneapi/releases/latest
Pending bugs : https://www.haproxy.org/l/pending-bugs
Reviewed bugs : https://www.haproxy.org/l/reviewed-bugs
Code reports : https://www.haproxy.org/l/code-reports
Latest builds : https://www.haproxy.org/l/dev-packages
Willy
---
Complete changelog :
Amaury Denoyelle (11):
BUG/MINOR: quic: reserve length field for long header encoding
BUG/MINOR: quic: fix CRYPTO payload size calcul for encoding
BUG/MINOR: quic: prevent crash on conn access after MUX init failure
BUG/MINOR: mux-quic: prevent crash after MUX init failure
MINOR: quic: rename pacing_rate cb to pacing_inter
MINOR: mux-quic: increment pacing retry counter on expired
MEDIUM: quic: implement credit based pacing
MEDIUM: mux-quic: reduce pacing CPU usage with passive wait
MEDIUM: quic: use dynamic credit for pacing
MINOR: quic: remove unused pacing burst in bind_conf/quic_cc_path
MINOR: quic: adapt credit based pacing to BBR
Aurelien DARRAGON (1):
BUG/MINOR: stktable: invalid use of stkctr_set_entry() with mixed table
types
Christopher Faulet (22):
BUG/MEDIUM: cli: Be sure to drop all input data in END state
BUG/MINOR: cli: Wait for the last ACK when FDs are xferred from the old
worker
BUG/MEDIUM: filters: Handle filters registered on data with no payload
callback
BUG/MINOR: fcgi: Don't set the status to 302 if it is already set
REGTESTS: Fix truncated.vtc to send 0-CRLF
BUG/MINOR: mux-h2: Properly handle full or truncated HTX messages on shut
BUG/MEDIUM: mux-fcgi: Properly handle read0 on partial records
BUG/MINOR: tcp-rules: Don't forward close during tcp-response content
rules eval
BUG/MINOR: http-check: Don't pretend a C-L heeader is set before adding it
BUG/MEDIUM: flt-spoe: Set/test applet flags instead of SE flags from I/O
handler
BUG/MEDIUM: applet: Don't pretend to have more data to handle
EOI/EOS/ERROR
BUG/MEDIUM: flt-spoe: Properly handle end of stream from the SPOE applet
MINOR: flt-spoe: Report end of input immediately after applet init
MINOR: mux-spop: Report EOI on the SE when a ACK is received for a stream
MINOR: mux-spop: Set SPOP_CF_ERROR flag on connection error only
BUG/MINOR: cli: Don't set SE flags from the cli applet
BUG/MINOR: cli: Fix memory leak on error for _getsocks command
BUG/MINOR: cli: Fix a possible infinite loop in _getsocks()
BUG/MINOR: config/userlist: Support one 'users' option for 'group'
directive
BUG/MINOR: auth: Fix a leak on error path when parsing user's groups
BUG/MINOR: flt-trace: Support only one name option
BUG/MINOR: stats-json: Define JSON_INT_MAX as a signed integer
Lukas Tribus (1):
DOC: option redispatch should mention persist options
William Lallemand (7):
BUG/MEDIUM: ssl: chosing correct certificate using RSA-PSS with TLSv1.3
BUG/MINOR: mworker: section ignored in discovery after a
post_section_parser
BUG/MINOR: mworker: post_section_parser for the last section in discovery
BUG/MINOR: ssl/cli: "show ssl crt-list" lacks client-sigals
BUG/MINOR: ssl/cli: "show ssl crt-list" lacks sigals
BUG/MEDIUM: htx: wrong count computation in htx_xfer_blks()
DOC: htx: clarify <mark> parameter for htx_xfer_blks()
Willy Tarreau (8):
BUG/MEDIUM: debug: close a possible race between thread dump and panic()
BUG/MEDIUM: fd: mark FD transferred to another process as FD_CLONED
MINOR: epoll: permit to mask certain specific events
BUG/MEDIUM: chunk: make sure to flush the trash pool before resizing
DEBUG: fd: add a counter of takeovers of an FD since it was last opened
MINOR: fd: add a generation number to file descriptors
DEBUG: epoll: store and compare the FD's generation count with reported
event
MEDIUM: epoll: skip reports of stale file descriptors
---