Hi,

HAProxy 3.0.4 was released on 2024/09/03. It added 42 new commits
after version 3.0.3.

This version addresses two issues affecting how the H2 mux deals with
incomplete frames:
  - in one case, certain errors happening while processing an incomplete
    frame did not lead to the termination of the connection, and would
    cause endless wakeups to try to handle the error, preventing the
    process from sleeping, thus eating CPU.

  - another case, much harder to reproduce but also observed as actively
    exploited in one case, can cause an endless loop in the h2_send()
    function if a processing error requiring a GOAWAY is reported with an
    almost full output buffer when no more progress can be made on the
    input buffer due to an incomplete frame while many streams are
    transmitting data in parallel in zero-copy mode. What happens in this
    case is that the output buffer is cleared (due to the error) while
    still leaving the full indication that prevents output data from being
    considered, and no condition to exit the loop is met. In this case the
    loop will be interrupted by the watchdog which will kill the process
    after two seconds. A work-around consists in simply disabling
    zero-copy forwarding for HTTP/2:  "tune.h2.zero-copy-fwd-send off".
    This issue was assigned CVE-2024-45506.

Other than that, the following issues were fixed:
  - CLI: the "show threads" command would crash if issued with less than
    16 threads (due to an area shared for two different things it would
    start to dump threads from the 17th).

  - JWT: the SSL library functions used to validate a token would leave
    an error in the SSL stack, that will later be mistaken for an error
    on another connection and cause it to be closed.

  - a time-of-check/time-of-use (TOCTOU) issue in the queue processing
    makes it rare but possible to leave a server with no connection yet
    not take any traffic. It's more likely to happen with maxconn 1,
    very hard at 2 and almost impossible at 3 or above.

  - QUIC: when the accept queue is full and the incoming connection cannot
    be migrated to another thread, it needed to be re-migrated to the local
    thread, which was not supported because the migration had already
    started, and would crash. Also there was a case which could produce
    crashes when built with the aws-lc TLS library.

  - OCSP: a memory allocation error while loading OCSP parameters could
    leave the tree locked and freeze subsequent operations.

  - some uploads to H2 servers could freeze due to the zero-copy
    forwarding not always setting the END_STREAM flag on the last DATA
    frame (GH #2665).

  - it was possible to crash the process when performing an implicit
    protocol upgrade (TCP to HTTP due to a transition from a TCP front
    to an HTTP back) if an error happened on the connection just before
    the transition.

  - a crash could happen in mux-pt if an error happened on the connection
    just before an abort that is going to emit a shutdown, and with a
    pending wakeup that completes some work on a connection having no
    transport layer anymore. This only affects TCP (e.g. peers and master
    CLI; GH #2656).

  - mux-h1 could repeat a 408 error multiple times in logs when failing
    to send an empty message on a full output buffer. In this case, it
    would attempt to close again every client timeout and produce a log
    each time despite no data leaving.

Finally these changes are not exactly issues but address problems
encountered in some setups:

  - the hard limit on the number of file descriptors now defaults to about
    1 million, in order to match what has been done for a very long time
    on many distros, and that recently changed to 1 billion on some of
    them, causing a huge startup time (or even a watchdog at boot) and a
    massive memory usage.

  - New global directive "h1-do-not-close-on-insecure-transfer-encoding"
    was added to explicitly permit to maintain a connection alive when it
    uses both Content-Length and Transfer-Encoding. This goes against the
    latest version of the HTTP specification but may be needed with some
    clients or servers which cause too many TLS reconnections because of
    this despite being in well controlled environments.

  - The log-format parser became stricter in 3.0 as a side effect of some
    of the log processing improvements. A fatal error was emitted when an
    alias or expression was used in an incompatible context (e.g. HTTP info
    in TCP logs, but there are more subtle ones), preventing some working
    configs from starting. This has now been relaxed and is only produced
    in diagnostic mode (-dD). This was Github issue #2642.

  - support for "retry-on 429" was added (GH #2687)

And the rest is pretty minor. For the more problematic issues above, a
2.9 release will follow shortly.

Thanks to all those who kindly shared traces, dumps and backtraces,
because many of the issues above were particularly hard to figure, let
alone reproduce, but the overall increasing level of details provided in
bug reports helps a lot! If some have suggestions about what can make
their lives easier when providing detailed traces, feel free to share
them!

Note that at this point this flushes the queue of pending bugs for 3.0,
which is a good news. There remains one exception, a recently introduced
QUIC patchset into 3.1 to implement NEW_TOKEN on 0-RTT that we'd like to
backport since it addresses some bad corner cases. But the backport is
non-trivial and the patches need to be exposed a bit longer in 3.1 first,
this might come in 3.0.5.

Please find the usual URLs below :
   Site index       : https://www.haproxy.org/
   Documentation    : https://docs.haproxy.org/
   Wiki             : https://github.com/haproxy/wiki/wiki
   Discourse        : https://discourse.haproxy.org/
   Slack channel    : https://slack.haproxy.org/
   Issue tracker    : https://github.com/haproxy/haproxy/issues
   Sources          : https://www.haproxy.org/download/3.0/src/
   Git repository   : https://git.haproxy.org/git/haproxy-3.0.git/
   Git Web browsing : https://git.haproxy.org/?p=haproxy-3.0.git
   Changelog        : https://www.haproxy.org/download/3.0/src/CHANGELOG
   Dataplane API    : 
https://github.com/haproxytech/dataplaneapi/releases/latest
   Pending bugs     : https://www.haproxy.org/l/pending-bugs
   Reviewed bugs    : https://www.haproxy.org/l/reviewed-bugs
   Code reports     : https://www.haproxy.org/l/code-reports
   Latest builds    : https://www.haproxy.org/l/dev-packages

Willy
---
Complete changelog :
Amaury Denoyelle (6):
      MINOR: proto: extend connection thread rebind API
      BUG/MEDIUM: quic: prevent crash on accept queue full
      CLEANUP: proto: rename TID affinity callbacks
      CLEANUP: quic: rename TID affinity elements
      BUG/MINOR: stick-table: fix crash for src_inc_gpc() without stkcounter
      DOC: quic: fix default minimal value for max window size

Aurelien DARRAGON (2):
      MEDIUM: sink: don't set NOLINGER flag on the outgoing stream interface
      MEDIUM: log: relax some checks and emit diag warnings instead in 
lf_expr_postcheck()

Christopher Faulet (12):
      BUG/MINOR: session: Eval L4/L5 rules defined in the default section
      BUG/MINOR: server: Don't warn fallback IP is used during init-addr 
resolution
      BUG/MINOR: cli: Atomically inc the global request counter between CLI 
commands
      BUG/MEDIUM: jwt: Clear SSL error queue on error when checking the 
signature
      MINOR: proxy: Add support of 429-Too-Many-Requests in retry-on status
      BUG/MEDIUM: mux-h2: Set ES flag when necessary on 0-copy data forwarding
      BUG/MEDIUM: stream: Prevent mux upgrades if client connection is no 
longer ready
      BUG/MINIR: proxy: Match on 429 status when trying to perform a L7 retry
      BUG/MEDIUM: mux-pt: Never fully close the connection on shutdown
      BUG/MEDIUM: cli: Always release back endpoint between two commands on the 
mcli
      BUG/MEDIUM: mux-h1: Properly handle empty message when an error is 
triggered
      BUG/MEDIUM: mux-pt: Fix condition to perform a shutdown for writes in 
mux_pt_shut()

Frederic Lecaille (7):
      BUG/MINOR: quic: Non optimal first datagram.
      BUG/MINOR: quic: Lack of precision when computing K (cubic only cc)
      MINOR: quic: Dump TX in flight bytes vs window values ratio.
      MINOR: quic: Add information to "show quic" for CUBIC cc.
      BUG/MINOR: quic: unexploited retransmission cases for Initial pktns.
      BUG/MINOR: quic: Too shord datagram during O-RTT handshakes (aws-lc only)
      BUG/MINOR: Crash on O-RTT RX packet after dropping Initial pktns

Lukas Tribus (1):
      DOC: install: don't reference removed CPU arg

Valentine Krasnobaeva (3):
      BUG/MEDIUM: ssl_sock: fix deadlock in ssl_sock_load_ocsp() on error path
      MEDIUM: init: set default for fd_hard_limit via DEFAULT_MAXFD (take #2)
      BUG/MEDIUM: init: fix fd_hard_limit default in compute_ideal_maxconn

William Lallemand (1):
      DOC: configuration: issuers-chain-path not compatible with OCSP

Willy Tarreau (10):
      BUILD: listener: silence a build warning about unused value without 
threads
      BUG/MEDIUM: debug/cli: fix "show threads" crashing with low thread counts
      BUG/MAJOR: mux-h2: force a hard error upon short read with pending error
      DOC: config: improve the http-keep-alive section
      MEDIUM: h1: allow to preserve keep-alive on T-E + C-L
      MINOR: queue: add a function to check for TOCTOU after queueing
      BUG/MEDIUM: queue: deal with a rare TOCTOU in assign_server_and_queue()
      Revert "MEDIUM: sink: don't set NOLINGER flag on the outgoing stream 
interface"
      MINOR: mux-h2: try to clear DEM_MROOM and MUX_MFULL at more places
      BUG/MAJOR: mux-h2: always clear MUX_MFULL and DEM_MROOM when clearing the 
mbuf

---


Reply via email to