[ANNOUNCE] haproxy-2.4.13

Willy Tarreau Wed, 16 Feb 2022 07:35:57 -0800

Hi,

HAProxy 2.4.13 was released on 2022/02/16. It added 28 new commits
after version 2.4.12.


No need to read this if you already read the 2.5.2 announcement :-)
This is essentially the same as what changed from 2.5.1 to 2.5.2, except
master and httpclient, so I'm copy-pasting here the relevant parts below.

This version addresses a few long-term bugs that have been keeping us
quite busy for far too long, but ultimately it's satisfying to know that
these ones are gone and that they won't be casting a doubt over every
single bug report.

The main issues fixed in this version are:
  - a tiny race condition in the scheduler affecting the rare multi-
    threaded tasks. In some cases, a task could be finishing to run
    on one thread and expiring on another one, just in the process
    of being requeued to the position being in the process of being
    calculated by the thread finishing with it. The most likely case
    was the peers task disabling the expiration while waiting for other
    peers to be locked, causing such a non-expirable task to be queued
    and to block all other timers from expiring (typically health checks,
    peers and resolvers, but others were affected). This could only
    happen at high peers traffic rate but it definitely did. When built
    with the suitable options such as DEBUG_STRICT it would immediately
    crash (which is how it was detected). This bug was present since 2.0.

  - a bug in the Set-Cookie2 response parser may result in an infinite
    loop triggering the watchdog if a server sends this while it belongs
    to a backend configured with cookie persistence. Usually cookie-based
    persistence is not used with untrusted servers, but if that was the
    case, the following rule would be usable as a workaround for the time
    it takes to upgrade:

         http-response del-header Set-Cookie2 

    It reminded us that 2.5 years ago we were discussing about completely
    dropping Set-Cookie2 which never succeeded in field, Tim has opened an
    issue so that we don't forget to remove it after 2.6. This issue was
    diagnosed, reported and fixed by Andrew McDermott and Grant Spence.
    This bug was there since 1.9.

  - a bug in the SPOE error handling. When a connection to an agent dies,
    there may still be requests pending that are tied to this connection.
    The list of such requests is scanned so that they can be aborted,
    except that the condition to scan the list was incorrect, and when
    these requests were finally aborted upon processing timeout, they were
    updating the memory area they used to point to, which could have been
    reused for anything, causing random crashes very commonly seen in
    libc's malloc/free va openssl, or haproxy pools with corrupted pointers.
    In short, anyone using SPOE must absolutely update to apply the fix
    otherwise any bug they face cannot be trusted as we know there's a rare
    but real case of memory corruption there. This bug was present since
    1.8.

  - there was a possible race condition on the listeners where it was
    sometimes possible to wake up a temporarily paused listener just after
    it had failed to rebind upon a failed attempt to reload. This would
    access fdtab[-1] causing memory corruption or crashes. It's been there
    since 2.2 but really started to have an effect with 2.3.

  - the master CLI could remain stuck forever if extra characters followed
    by a shutdown were sent before the end of a response. In this case, each
    such connection would remain unusable, and a script doing this would
    face a connection failure after the 10th attempt (master's maxconn). A
    few related issues could also cause it to loop forever (e.g. too long
    pipelined requests, and empty buffers after wrapping).

  - the connection stopping list introduced in 2.4 to deal with idle
    frontend connection on reloads missed a deletion, and could leave link
    elements in the list after their containing structure was freed, causing
    occasional crashes of the old process upon reload.

  - there is an ambiguity in the definition of dynamic table size updates
    between the HTTP/2 spec (RFC7540) and the HPACK spec (RFC7541) which
    can be read two ways. HAProxy and a few servers interpret it one way
    and a few clients and other servers interpret it another way (and
    generally clients win, as usual). One client, nghttp, enforces it
    strictly, causing interoperability issues with haproxy and a few other
    ones when the table size is set below 4096. We had a long discussion
    with other participants of the HTTP working group to find the best
    path forward that resulted in a nice update of the H2 spec that 
    preserves the best interoperability with existing components while
    clarifying all points. This update is present in this version and
    will be progressively backported to older ones after some time (I
    managed to mess up with the first attempt).

  - there was an issue with the data transfer in the HTX layer, however
    I'm not very clear on the impact, I think it can sometimes cause data
    to be truncated or just blocked.

  - the "set server ssl" CLI command introduced in 2.4 had the undesirable
    side effect of modifying the data path and the check path at the same
    time (by mimmicking the configration), which causes quite some trouble.
    Now the doc was updated to clearly state that only the data path is
    updated, and the code does that (otherwise it is unusable anyway).

  - there were still a number of other issues of lower level of importance,
    such as the CLI being extremely slow to parse pipelined requests because
    it was looking for the line feed first, hence the larger the buffer, the
    slower it was with batch updates like ACL/map updates.

Some debugging options were added and backported. One that recently helped
us is DEBUG_POOL_INTEGRITY combined with the existing DEBUG_DONT_SHARE_POOLS
and DEBUG_STRICT. The first two ones will provide sort of an equivalent of
the use-after-free debug option that was not suitable for production, by
checking if released memory areas were tampered with between their last
free() and the next malloc(). This slightly increases CPU usage (1-2%
typically) but will catch most memory corruptions much earlier and much
cleaner than what happened over the last weeks: instead of crashing at
random places that are victims of a change, the crash happens much closer
to the bad actor, and with more context to figure what happened.

And quite frankly for all those who can afford it (i.e. all those not running
at more than 98% CPU), I would kindly ask to add these 4 options to their
build command line so that their future bug reports are much more accurate:

  $ make ... DEBUG="-DDEBUG_STRICT -DDEBUG_MEMORY_POOLS \
                    -DDEBUG_DONT_SHARE_POOLS -DDEBUG_POOL_INTEGRITY"

This also allows developers to quickly rule out many potential causes and
provide responses faster. By the way we're always running with all debugging
turned full-throttle on haproxy.org and recently switched from DEBUG_UAF to
DEBUG_POOL_INTEGRITY.

Please find the usual URLs below :
   Site index       : http://www.haproxy.org/
   Discourse        : http://discourse.haproxy.org/
   Slack channel    : https://slack.haproxy.org/
   Issue tracker    : https://github.com/haproxy/haproxy/issues
   Wiki             : https://github.com/haproxy/wiki/wiki
   Sources          : http://www.haproxy.org/download/2.4/src/
   Git repository   : http://git.haproxy.org/git/haproxy-2.4.git/
   Git Web browsing : http://git.haproxy.org/?p=haproxy-2.4.git
   Changelog        : http://www.haproxy.org/download/2.4/src/CHANGELOG
   Cyril's HTML doc : http://cbonte.github.io/haproxy-dconv/

Willy
---
Complete changelog :
Andrew McDermott (1):
      BUG/MAJOR: http/htx: prevent unbounded loop in 
http_manage_server_side_cookies

Christopher Faulet (2):
      BUG/MEDIUM: htx: Adjust length to add DATA block in an empty HTX buffer
      BUG/MEDIUM: cli: Never wait for more data on client shutdown

David Carlier (1):
      BUILD/MINOR: fix solaris build with clang.

William Dauchy (1):
      BUG/MEDIUM: server: avoid changing healthcheck ctx with set server ssl

William Lallemand (1):
      BUG/MINOR: mworker: does not erase the pidfile upon reload

Willy Tarreau (22):
      BUG/MEDIUM: connection: properly leave stopping list on error
      MEDIUM: cli: yield between each pipelined command
      MINOR: channel: add new function co_getdelim() to support multiple 
delimiters
      BUG/MINOR: cli: avoid O(bufsize) parsing cost on pipelined commands
      BUG/MEDIUM: mcli: do not try to parse empty buffers
      BUG/MEDIUM: mcli: always realign wrapping buffers before parsing them
      MEDIUM: h2/hpack: emit a Dynamic Table Size Update after settings change
      DEBUG: cli: add a new "debug dev fd" expert command
      BUILD: debug/cli: condition test of O_ASYNC to its existence
      DEBUG: pools: add new build option DEBUG_POOL_INTEGRITY
      BUG/MEDIUM: mworker: don't lose the stats socket on failed reload
      BUG/MINOR: pools: always flush pools about to be destroyed
      DEBUG: pools: add extra sanity checks when picking objects from a local 
cache
      DEBUG: pools: let's add reverse mapping from cache heads to thread and 
pool
      DEBUG: pools: replace the link pointer with the caller's address on 
pool_free()
      BUG/MAJOR: sched: prevent rare concurrent wakeup of multi-threaded tasks
      MINOR: listener: replace the listener's spinlock with an rwlock
      BUG/MEDIUM: listener: read-lock the listener during accept()
      BUG/MAJOR: spoe: properly detach all agents when releasing the applet
      REGTESTS: peers: leave a bit more time to peers to synchronize
      BUG/MEDIUM: h2/hpack: fix emission of HPACK DTSU after settings change
      BUG/MINOR: mux-h2: update the session's idle delay before creating the 
stream

---

[ANNOUNCE] haproxy-2.4.13

Reply via email to