Hi,
HAProxy 2.4.13 was released on 2022/02/16. It added 28 new commits
after version 2.4.12.
No need to read this if you already read the 2.5.2 announcement :-)
This is essentially the same as what changed from 2.5.1 to 2.5.2, except
master and httpclient, so I'm copy-pasting here the relevant parts below.
This version addresses a few long-term bugs that have been keeping us
quite busy for far too long, but ultimately it's satisfying to know that
these ones are gone and that they won't be casting a doubt over every
single bug report.
The main issues fixed in this version are:
- a tiny race condition in the scheduler affecting the rare multi-
threaded tasks. In some cases, a task could be finishing to run
on one thread and expiring on another one, just in the process
of being requeued to the position being in the process of being
calculated by the thread finishing with it. The most likely case
was the peers task disabling the expiration while waiting for other
peers to be locked, causing such a non-expirable task to be queued
and to block all other timers from expiring (typically health checks,
peers and resolvers, but others were affected). This could only
happen at high peers traffic rate but it definitely did. When built
with the suitable options such as DEBUG_STRICT it would immediately
crash (which is how it was detected). This bug was present since 2.0.
- a bug in the Set-Cookie2 response parser may result in an infinite
loop triggering the watchdog if a server sends this while it belongs
to a backend configured with cookie persistence. Usually cookie-based
persistence is not used with untrusted servers, but if that was the
case, the following rule would be usable as a workaround for the time
it takes to upgrade:
http-response del-header Set-Cookie2
It reminded us that 2.5 years ago we were discussing about completely
dropping Set-Cookie2 which never succeeded in field, Tim has opened an
issue so that we don't forget to remove it after 2.6. This issue was
diagnosed, reported and fixed by Andrew McDermott and Grant Spence.
This bug was there since 1.9.
- a bug in the SPOE error handling. When a connection to an agent dies,
there may still be requests pending that are tied to this connection.
The list of such requests is scanned so that they can be aborted,
except that the condition to scan the list was incorrect, and when
these requests were finally aborted upon processing timeout, they were
updating the memory area they used to point to, which could have been
reused for anything, causing random crashes very commonly seen in
libc's malloc/free va openssl, or haproxy pools with corrupted pointers.
In short, anyone using SPOE must absolutely update to apply the fix
otherwise any bug they face cannot be trusted as we know there's a rare
but real case of memory corruption there. This bug was present since
1.8.
- there was a possible race condition on the listeners where it was
sometimes possible to wake up a temporarily paused listener just after
it had failed to rebind upon a failed attempt to reload. This would
access fdtab[-1] causing memory corruption or crashes. It's been there
since 2.2 but really started to have an effect with 2.3.
- the master CLI could remain stuck forever if extra characters followed
by a shutdown were sent before the end of a response. In this case, each
such connection would remain unusable, and a script doing this would
face a connection failure after the 10th attempt (master's maxconn). A
few related issues could also cause it to loop forever (e.g. too long
pipelined requests, and empty buffers after wrapping).
- the connection stopping list introduced in 2.4 to deal with idle
frontend connection on reloads missed a deletion, and could leave link
elements in the list after their containing structure was freed, causing
occasional crashes of the old process upon reload.
- there is an ambiguity in the definition of dynamic table size updates
between the HTTP/2 spec (RFC7540) and the HPACK spec (RFC7541) which
can be read two ways. HAProxy and a few servers interpret it one way
and a few clients and other servers interpret it another way (and
generally clients win, as usual). One client, nghttp, enforces it
strictly, causing interoperability issues with haproxy and a few other
ones when the table size is set below 4096. We had a long discussion
with other participants of the HTTP working group to find the best
path forward that resulted in a nice update of the H2 spec that
preserves the best interoperability with existing components while
clarifying all points. This update is present in this version and
will be progressively backported to older ones after some time (I
managed to mess up with the first attempt).
- there was an issue with the data transfer in the HTX layer, however
I'm not very clear on the impact, I think it can sometimes cause data
to be truncated or just blocked.
- the "set server ssl" CLI command introduced in 2.4 had the undesirable
side effect of modifying the data path and the check path at the same
time (by mimmicking the configration), which causes quite some trouble.
Now the doc was updated to clearly state that only the data path is
updated, and the code does that (otherwise it is unusable anyway).
- there were still a number of other issues of lower level of importance,
such as the CLI being extremely slow to parse pipelined requests because
it was looking for the line feed first, hence the larger the buffer, the
slower it was with batch updates like ACL/map updates.
Some debugging options were added and backported. One that recently helped
us is DEBUG_POOL_INTEGRITY combined with the existing DEBUG_DONT_SHARE_POOLS
and DEBUG_STRICT. The first two ones will provide sort of an equivalent of
the use-after-free debug option that was not suitable for production, by
checking if released memory areas were tampered with between their last
free() and the next malloc(). This slightly increases CPU usage (1-2%
typically) but will catch most memory corruptions much earlier and much
cleaner than what happened over the last weeks: instead of crashing at
random places that are victims of a change, the crash happens much closer
to the bad actor, and with more context to figure what happened.
And quite frankly for all those who can afford it (i.e. all those not running
at more than 98% CPU), I would kindly ask to add these 4 options to their
build command line so that their future bug reports are much more accurate:
$ make ... DEBUG="-DDEBUG_STRICT -DDEBUG_MEMORY_POOLS \
-DDEBUG_DONT_SHARE_POOLS -DDEBUG_POOL_INTEGRITY"
This also allows developers to quickly rule out many potential causes and
provide responses faster. By the way we're always running with all debugging
turned full-throttle on haproxy.org and recently switched from DEBUG_UAF to
DEBUG_POOL_INTEGRITY.
Please find the usual URLs below :
Site index : http://www.haproxy.org/
Discourse : http://discourse.haproxy.org/
Slack channel : https://slack.haproxy.org/
Issue tracker : https://github.com/haproxy/haproxy/issues
Wiki : https://github.com/haproxy/wiki/wiki
Sources : http://www.haproxy.org/download/2.4/src/
Git repository : http://git.haproxy.org/git/haproxy-2.4.git/
Git Web browsing : http://git.haproxy.org/?p=haproxy-2.4.git
Changelog : http://www.haproxy.org/download/2.4/src/CHANGELOG
Cyril's HTML doc : http://cbonte.github.io/haproxy-dconv/
Willy
---
Complete changelog :
Andrew McDermott (1):
BUG/MAJOR: http/htx: prevent unbounded loop in
http_manage_server_side_cookies
Christopher Faulet (2):
BUG/MEDIUM: htx: Adjust length to add DATA block in an empty HTX buffer
BUG/MEDIUM: cli: Never wait for more data on client shutdown
David Carlier (1):
BUILD/MINOR: fix solaris build with clang.
William Dauchy (1):
BUG/MEDIUM: server: avoid changing healthcheck ctx with set server ssl
William Lallemand (1):
BUG/MINOR: mworker: does not erase the pidfile upon reload
Willy Tarreau (22):
BUG/MEDIUM: connection: properly leave stopping list on error
MEDIUM: cli: yield between each pipelined command
MINOR: channel: add new function co_getdelim() to support multiple
delimiters
BUG/MINOR: cli: avoid O(bufsize) parsing cost on pipelined commands
BUG/MEDIUM: mcli: do not try to parse empty buffers
BUG/MEDIUM: mcli: always realign wrapping buffers before parsing them
MEDIUM: h2/hpack: emit a Dynamic Table Size Update after settings change
DEBUG: cli: add a new "debug dev fd" expert command
BUILD: debug/cli: condition test of O_ASYNC to its existence
DEBUG: pools: add new build option DEBUG_POOL_INTEGRITY
BUG/MEDIUM: mworker: don't lose the stats socket on failed reload
BUG/MINOR: pools: always flush pools about to be destroyed
DEBUG: pools: add extra sanity checks when picking objects from a local
cache
DEBUG: pools: let's add reverse mapping from cache heads to thread and
pool
DEBUG: pools: replace the link pointer with the caller's address on
pool_free()
BUG/MAJOR: sched: prevent rare concurrent wakeup of multi-threaded tasks
MINOR: listener: replace the listener's spinlock with an rwlock
BUG/MEDIUM: listener: read-lock the listener during accept()
BUG/MAJOR: spoe: properly detach all agents when releasing the applet
REGTESTS: peers: leave a bit more time to peers to synchronize
BUG/MEDIUM: h2/hpack: fix emission of HPACK DTSU after settings change
BUG/MINOR: mux-h2: update the session's idle delay before creating the
stream
---