Hi,
HAProxy 2.9-dev3 was released on 2023/08/12. It added 105 new commits
after version 2.9-dev2. It got a bit delayed by the last-minute bugs that
all had to be fixed at the same time last week.
A number of bugs were addressed, essentially the same as those that landed
in 2.8.2 and other. A new one was just fixed, which affects configurations
mixing "lua-load" and "lua-load-per-thread", these can crash if some HTTP
rules reference sample fetches or actions from both of them due to the stack
not being reset between calls.
This version starts to include more sensitive changes, so please test it
with care:
- QUIC: unused parts of connections are released ASAP now. This is
important at the end of a connection, when there's nothing anymore
to send, while it may lay there hanging for a while, it's now possible
to release multiple kB of RAM and only keep what's really essential.
This is comparable to the TCP TIME_WAIT state which is only a minimal
state. This should improve memory usage for those dealing with many
QUIC connections.
- stick-tables and peers: the locking was still particularly heavy and
almost not revisited since 1.8, and it didn't scale at all with threads.
Worse, the performance would quickly collapse, to the point of causing
me a problem to decide whether or not to enable more threads when
detected. In my tests with 80 threads, the request rate was divided by
10 as soon as one peer was declared in the configuration (even if not
used, such as the local one to deal with reloads). The locking was
carefully refined by disentangling lookups from peer updates, and fixing
some undesired cache line sharing, which resulted in the performance
raising from 277k to 4.4M req/s over several patches. There is still
room for improvement for small systems due to the way the updates are
managed, which could be significantly cheaper, but for now that was
out of the radar given that the goal was to regain scalability.
- pools: on large systems we've seen several times H2 and H3 underperform
due to extreme contention on the shared pools and/or on the allocation
counter. Here the approach is different because it's related to some
common information that it accessed by many cores at once, and sometimes
with a huge inter-core latency (yes I'm watching you EPYC). The approach
taken here to resolve this contention consists in splitting counters
into multiple shards. They're not indexed on the threads (since objects
can migrate between threads) but on their pointers themselves. The
results are particularly good, seeing H2 request rate jump from
1.5M to 6.7M requests/s on 80-threads without the shared cache, and from
368k to 4.7M requests/s with the shared cache (that one can even reach
5.5M with a 32-wide shard but I consider that it's not worth doing it
by default as it consumes a bit more memory). Here again it seems like
the cluster management that unlocked pools in the past could be improved
to preserve locality, but this is probably for later.
- threads in general: the upgrade of the locking code now applies
exponential backoff during lock upgrades and when a writer is waiting
for all readers to leave, and makes use of some refined barriers on
ARM systems, bringing a 2-4% gain on x86 and 14-33% on ARM.
- samples: we'd like to have all of the log-format tags available as
samples but we've found that some of them are so specific or specially
formatted that it's unlikely all of them will have a perfect match. But
at least we're trying to make sure that information available as log
tags is also available as samples that can be used to build conditions
or serve in calculations. At the moment, the following ones were added:
- the various timers available as T* tags in log-format now have an
equivalent sample fetch function. This can allow for example to pass
some info to the server, to perform some calculations such as avg
download rate, or change the log level based on certain thresholds
being met.
- accept_date/request_date return respectively the accept date and
the request receipt date, in seconds/milliseconds/microseconds
depending on the argument.
- a few other ones such as "pid" which returns the process' ID, and
"act_conn" returning the number of active connections on the process.
- new converters "ms_utime", "ms_ltime", "us_utime", "us_ltime" that
take on input respectively a timestamp in millisecond or microsecond
and produce an strftime-compatible format with support for a milli
and/or microsecond tag in it (%3N/%6N). To be used with accept_date
and request_date.
- a new sample fetch called "acl" is used to evaluate ACLs and return
their combined result. The principle is that it allows to perform a
logical AND between several of them, possibly inverted, and deliver
this result (possibly to be used in turn in ACLs). This helps reducing
the number of ACLs by using exclusion. For example it becomes possible
to create an ACL of the source addresses of all corporate offices, and
an ACL of the guest networks, and use "acl(offices,!guest)" as a check
for only offices but not guest networks, and possibly create an ACL
"trusted_networks" from it that can be used everywhere else in the
config.
- and the usual CI, doc, cleanups
I'm not doubting that those interested in saving CPU cycles on large
machines or saving memory with QUIC will want to give it a try (and
they'd be right). It should work (haproxy.org currently runs on it),
just don't deploy this a friday before going to vacation :-)
Please find the usual URLs below :
Site index : https://www.haproxy.org/
Documentation : https://docs.haproxy.org/
Wiki : https://github.com/haproxy/wiki/wiki
Discourse : https://discourse.haproxy.org/
Slack channel : https://slack.haproxy.org/
Issue tracker : https://github.com/haproxy/haproxy/issues
Sources : https://www.haproxy.org/download/2.9/src/
Git repository : https://git.haproxy.org/git/haproxy.git/
Git Web browsing : https://git.haproxy.org/?p=haproxy.git
Changelog : https://www.haproxy.org/download/2.9/src/CHANGELOG
Dataplane API :
https://github.com/haproxytech/dataplaneapi/releases/latest
Pending bugs : https://www.haproxy.org/l/pending-bugs
Reviewed bugs : https://www.haproxy.org/l/reviewed-bugs
Code reports : https://www.haproxy.org/l/code-reports
Latest builds : https://www.haproxy.org/l/dev-packages
Willy
---
Complete changelog :
Amaury Denoyelle (5):
BUG/MEDIUM: quic: consume contig space on requeue datagram
BUG/MINOR: quic: reappend rxbuf buffer on fake dgram alloc error
BUILD: quic: fix wrong potential NULL dereference
MINOR: h3: abort request if not completed before full response
BUG/MEDIUM: quic: fix tasklet_wakeup loop on connection closing
Aurelien DARRAGON (3):
BUG/MINOR: hlua: fix invalid use of lua_pop on error paths
MINOR: hlua: add hlua_stream_ctx_prepare helper function
BUG/MEDIUM: hlua: streams don't support mixing lua-load with
lua-load-per-thread
Christopher Faulet (10):
BUG/MEDIUM: h3: Properly report a C-L header was found to the HTX
start-line
BUG/MEDIUM: h3: Be sure to handle fin bit on the last DATA frame
BUG/MEDIUM: bwlim: Reset analyse expiration date when then channel
analyse ends
MEDIUM: stream: Reset response analyse expiration date if there is no
analyzer
BUG/MINOR: htx/mux-h1: Properly handle bodyless responses when splicing
is used
BUG/MINOR: http-client: Don't forget to commit changes on HTX message
CLEANUP: stconn: Move comment about sedesc fields on the field line
REGTESTS: http: Create a dedicated script to test spliced bodyless
responses
REGTESTS: Test SPLICE feature is enabled to execute script about splicing
BUG/MAJOR: http-ana: Get a fresh trash buffer for each header value
replacement
Dragan Dosen (1):
BUG/MINOR: chunk: fix chunk_appendf() to not write a zero if buffer is
full
Frédéric Lécaille (25):
BUG/MINOR: quic: Possible crash when acknowledging Initial v2 packets
MINOR: quic: Export QUIC traces code from quic_conn.c
MINOR: quic: Export QUIC CLI code from quic_conn.c
MINOR: quic: Move TLS related code to quic_tls.c
MINOR: quic: Add new "QUIC over SSL" C module.
MINOR: quic: Add a new quic_ack.c C module for QUIC acknowledgements
CLEANUP: quic: Defined but no more used function
(quic_get_tls_enc_levels())
MINOR: quic: Split QUIC connection code into three parts
CLEANUP: quic: quic_conn struct cleanup
MINOR: quic; Move the QUIC frame pool to its proper location
BUG/MINOR: quic+openssl_compat: Non initialized TLS encryption levels
CLEANUP: quic: Remove quic_path_room().
MINOR: quic: Amplification limit handling sanitization.
MINOR: quic: Move some counters from [rt]x quic_conn anonymous struct
MEDIUM: quic: Send CONNECTION_CLOSE packets from a dedicated buffer.
MINOR: quic: Use a pool for the connection ID tree.
MEDIUM: quic: Allow the quic_conn memory to be asap released.
MINOR: quic: Release asap quic_conn memory (application level)
MINOR: quic: Release asap quic_conn memory from ->close() xprt callback.
MINOR: quic: Warning for OpenSSL wrapper QUIC bindings without
"limited-quic"
BUG/MINOR: quic: mux started when releasing quic_conn
BUG/MINOR: quic: Possible crash in quic_cc_conn_io_cb() traces.
MINOR: quic: Add a trace for QUIC conn fd ready for receive
BUG/MINOR: quic: Possible crash when issuing "show fd/sess" CLI commands
BUG/MINOR: quic: Missing tasklet (quic_cc_conn_io_cb) memory release
(leak)
Ilya Shipitsin (2):
CI: do not use "groupinstall" for Fedora Rawhide builds
CI: get rid of travis-ci wrapper for Coverity scan
Patrick Hemmer (3):
CLEANUP: acl: remove cache_idx from acl struct
REORG: cfgparse: extract curproxy as a global variable
MINOR: acl: add acl() sample fetch
Remi Tricot-Le Breton (1):
BUG/MINOR: ssl: OCSP callback only registered for first SSL_CTX
William Lallemand (9):
MINOR: sample: add pid sample
MINOR: sample: implement act_conn sample fetch
MINOR: sample: accept_date / request_date return %Ts / %tr timestamp
values
MEDIUM: sample: implement us and ms variant of utime and ltime
BUG/MINOR: sample: check alloc_trash_chunk() in conv_time_common()
DOC: configuration: describe Td in Timing events
MINOR: sample: implement the T* timer tags from the log-format as fetches
DOC: configuration: add sample fetches for timing events
DOC: configuration: rework the custom log format table
Willy Tarreau (46):
BUILD: cfgparse: keep a single "curproxy"
REORG: http: move has_forbidden_char() from h2.c to http.h
BUG/MAJOR: h3: reject header values containing invalid chars
MINOR: mux-h2/traces: also suggest invalid header upon parsing error
MINOR: ist: add new function ist_find_range() to find a character range
MINOR: http: add new function http_path_has_forbidden_char()
MINOR: h2: pass accept-invalid-http-request down the request parser
REGTESTS: http-rules: add accept-invalid-http-request for normalize-uri
tests
BUG/MINOR: h1: do not accept '#' as part of the URI component
BUG/MINOR: h2: reject more chars from the :path pseudo header
BUG/MINOR: h3: reject more chars from the :path pseudo header
REGTESTS: http-rules: verify that we block '#' by default for
normalize-uri
DOC: clarify the handling of URL fragments in requests
BUG/MAJOR: http: reject any empty content-length header value
BUG/MINOR: http: skip leading zeroes in content-length values
BUG/MEDIUM: mux-h1: fix incorrect state checking in h1_process_mux()
BUG/MEDIUM: mux-h1: do not forget EOH even when no header is sent
BUILD: mux-h1: shut a build warning on clang from previous commit
DEV: makefile: add a new "range" target to iteratively build all commits
MAJOR: threads/plock: update the embedded library again
MINOR: stick-table: move the task_queue() call outside of the lock
MINOR: stick-table: move the task_wakeup() call outside of the lock
MEDIUM: stick-table: change the ref_cnt atomically
MINOR: stick-table: better organize the struct stktable
MEDIUM: peers: update ->commitupdate out of the lock using a CAS
MEDIUM: peers: drop then re-acquire the wrlock in peer_send_teachmsgs()
MEDIUM: peers: only read-lock peer_send_teachmsgs()
MEDIUM: stick-table: use a distinct lock for the updates tree
MEDIUM: stick-table: touch updates under an upgradable read lock
MEDIUM: peers: drop the stick-table lock before entering
peer_send_teachmsgs()
MINOR: stick-table: move the update lock into its own cache line
CLEANUP: stick-table: slightly reorder the stktable struct
BUILD: defaults: use __WORDSIZE not LONGBITS for MAX_THREADS_PER_GROUP
MINOR: tools: make ptr_hash() support 0-bit outputs
MINOR: tools: improve ptr hash distribution on 64 bits
OPTIM: tools: improve hash distribution using a better prime seed
OPTIM: pools: use exponential back-off on shared pool allocation/release
OPTIM: pools: make pool_get_from_os() / pool_put_to_os() not update
->allocated
MINOR: pools: introduce the use of multiple buckets
MEDIUM: pools: spread the allocated counter over a few buckets
MEDIUM: pools: move the used counter over a few buckets
MEDIUM: pools: move the needed_avg counter over a few buckets
MINOR: pools: move the failed allocation counter over a few buckets
MAJOR: pools: move the shared pool's free_list over multiple buckets
MINOR: pools: make pool_evict_last_items() use pool_put_to_os_no_dec()
BUILD: pools: fix build error on clang with inline vs forceinline
---