Hi,
HAProxy 2.8-dev8 was released on 2023/04/23. It added 199 new commits
after version 2.8-dev7.
We're reaching the end of the planned sensitive changes and that's great,
so I'm addressing my thanks to all those who went through the effort of
polishing and testing their code before the last minute. We'll now engage
in a stabilization period with a bit more serenity.
Once we put the 49 bug fixes aside, among the visible changes in this
release are the following:
- Lua: the execution timeout was addressed. It did not always work for
sample fetches for example, so that infinite loops were still possible.
The reason was that it was relying on the internal timer that doesn't
make any progress in this case, so infinite loops in sample fetches
were still able to trigger the watchgod. Now a monotonic time is
periodically retrieved to make sure the task is still allowed to run,
otherwise it ends on failure. The maximum delay for this is currently
set to one second (which is already huge but matches previous usages)
and is configurable via tune.lua.burst-timeout. In addition to all this,
some minor improvements to the server events API to pass the proxy's ID
and a few elements needed to make the API more durable and extensible
were merged. It now sounds like it should be easy enough to write action
scripts in Lua that react to server up/down/add/del conditions.
- channels/stream connectors: the remaining harmless changes were merged
so as to clarify the internal API. The SC_FL_SHUTR flag that was
ambigous since it could both indicate a close from the bottom layers
or an abort from the upper ones was now split in two that are both
tested. For 2.9 we'll audit their usage to figure which ones to keep
at each place, but at least having them in good shape now will make
troubleshooting much easier and ease backporting fixes in the future.
It's interesting to note that the purpose of this refactoring was to
make the internal API more logical and clearer, and that the changes
needed to get there allowed us to spot several long-standing bugs, and
that it already happened recently that a few loops/timeout bugs that
affected 2.7 and older did not affect 2.8 anymore. Keeping fingers
crossed!
- QUIC: there was still a limitation with the way the incoming connection
load balancing was handled, with the lowest bits of the connection ID
serving to identify the thread ID. It was good enough but did not allow
to properly redistribute the load to available threads. In order to
address this limitation, QUIC connections now learned to migrate between
threads after their handshake, so that they effectively work exactly
like TCP sockets and file descriptors. As such they are now assigned to
a thread upon accept() using the same mechanism that was already available
for TCP connections. This results in a smoother and more controllable load
distribution. And the thread ID is no longer derived directly from the
connection ID but is located on the connection element itself, so that
there's no longer the risk that poorly written clients cause an imbalance
of the threads. Aside this, traces and debugging were improved, as usual.
There is still one rare bug that Tristan is facing, that I hope will be
found and fixed next week. Reaching such difficult bugs is a really good
indication that the stack is getting more mature now.
- "bind" lines are now fully compatible with thread groups and now support
more than 64 threads. A new "shards" setting, "by-group" allows to create
a new listener for each thread-group instead of having a single one for
the whole process or one per thread. For sockets families (or OSes) that
do not support sharding, a transparent fallback is performed which will
simply dup() the existing socket so that threads from multiple groups can
listen to the same socket. This also allowed to remove the hard-coded
restriction to group 1 for the stats sockets, and means that when using
more than one shard on a unix socket, we won't be seeing the last thread
take all the traffic after having removed and replaced the predecessor's
socket on the file system. Finally it was found that using the "by-group"
shard setting was the best compromise in general: when running with a
single group, it doesn't change anything, and when running with several
groups, it will as much as possible try to create several sockets with
the exact same number of file descriptors. I.e. it never costs more FDs
than the default setting while significantly reducing the kernel-side
locking. A test on a 24-core AMD EPYC 74F3 showed that the performance
simply doubled from 112k to 214k conn/s by reducing kernel locking
overhead from 71% to 55%. So that convinced me to enable "by-group" as
the new default *right now*. However, the change of default value is a
single commit, so if it were found to cause any problem, it's trivial to
revert. Regarding this, the algorithm used to assign a thread to a multi-
queue listener adopted a new variant, "fair", which can be suited for
those dealing with lots of short-lived connections. It's just a round
robin instead of a least-conn like variant. It results on an even
smoother load distribution between threads when all requests
statistically have an equivalent cost. Also, while testing this it was
found that SO_REUSEPORT doesn't always work well to distribute the load
on FreeBSD and that very often a single socket takes the load. FreeBSD
12 and above implement SO_REUSEPORT_LB to load-balance connections like
on Linux, though is is limited to 256 listeners per port. It was enabled
as it also contributes to significantly improving performance without
having to deal with tricky settings. Testers are welcome, especially on
non-Linux/FreeBSD/MacOS systems (since those are confirmed to work well).
- H2/H3: as discussed this week, the default ALPN for HTTPS frontends is
now automatically preset to "h2,http/1.1" for TCP listeners, and "h3"
for QUIC listeners, so that it's no longer needed to specify alpn on the
"bind" lines. Of course if specified, the value will prevail. A new
"no-alpn" keyword was added to disable ALPN. The global settings for the
initial window (which limits upload bitrate) and number of concurrent
streams (that can cause issues on backend) were finally split to be
adjustable per side. I'm thinking that we could possibly take this
opportunity to significantly increase the frontend's default window size
to 1 MB for example (~80 Mbps per stream at 100ms latency) to increase
the default POST upload bandwidth since it will no longer affect the
backend nor cause head-of-line blocking issues. Technically speaking,
it will cause latency to concurrent streams from the browser uploading
contents, but when uploading a large file, I strongly doubt anyone does
anything over the same connection in parallel, though some users might
have a different experience. Opinions welcome as usual.
- various code cleanups, build fixes and CI improvements
Now we still have 4-5 weeks for cleanups, fixes and doc. There are still
a few low hanging fruits I'd like to get for 2.8 but with low importance:
- enlarge the panic dump buffers (may use one trash per thread), because
crashes happening on more than 50-60 threads are truncated.
- make "show activity" yieldable because many threads and large values also
result in truncated output
- clean up and merge the ring locking improvements (x1.5-2). That's not
good enough in my opinion but the traces are used a lot for QUIC and
performance is a limiting factor there, so if we can raise the bar it
will still help.
- get rid of the historic now.tv_sec that encourages incorrect usage and
causes bugs that take a lot of time to be detected.
- fix the way output bytes are accounted for (it's done too low, and for
QUIC, it also counts retransmits).
- indication of bytes in flight for QUIC and probably for each mux stream
so that we can log them.
- renaming of various fields, variables or arguments that regularly cause
confusion when analysing the code.
- review various error messages or indications to make sure they're still
relevant (e.g. we recently found a suggestion to use OpenSSL >= 1.0.2).
The journey till there has been a little bit chaotic but I think we're now
on a good track for the last mile, so that's a good thing and for now I'm
rather positive on the upcoming 2.8. We'll try to emit one version per week
till the release from now on, to ease testing and bug reporting. If you're
interested in what is coming for 2.8, please try to find some time to give
it a try. And if you can't deploy in production, at least use it to check
your configs and report any anomalies or unexpected warning/error you'd
face.
Please find the usual URLs below :
Site index : https://www.haproxy.org/
Documentation : https://docs.haproxy.org/
Wiki : https://github.com/haproxy/wiki/wiki
Discourse : https://discourse.haproxy.org/
Slack channel : https://slack.haproxy.org/
Issue tracker : https://github.com/haproxy/haproxy/issues
Sources : https://www.haproxy.org/download/2.8/src/
Git repository : https://git.haproxy.org/git/haproxy.git/
Git Web browsing : https://git.haproxy.org/?p=haproxy.git
Changelog : https://www.haproxy.org/download/2.8/src/CHANGELOG
Dataplane API :
https://github.com/haproxytech/dataplaneapi/releases/latest
Pending bugs : https://www.haproxy.org/l/pending-bugs
Reviewed bugs : https://www.haproxy.org/l/reviewed-bugs
Code reports : https://www.haproxy.org/l/code-reports
Latest builds : https://www.haproxy.org/l/dev-packages
Willy
---
Complete changelog :
Amaury Denoyelle (27):
MINOR: fd: implement fd_migrate_on() to migrate on a non-local thread
BUG/MINOR: task: allow to use tasklet_wakeup_after with tid -1
CLEANUP: quic: remove unused QUIC_LOCK label
CLEANUP: quic: remove unused scid_node
CLEANUP: quic: remove unused qc param on stateless reset token
CLEANUP: quic: rename quic_connection_id vars
MINOR: quic: remove uneeded tasklet_wakeup after accept
MINOR: quic: adjust Rx packet type parsing
MINOR: quic: adjust quic CID derive API
MINOR: quic: remove TID ref from quic_conn
MEDIUM: quic: use a global CID trees list
MINOR: quic: remove TID encoding in CID
MEDIUM: quic: handle conn bootstrap/handshake on a random thread
MINOR: quic: do not proceed to accept for closing conn
MINOR: protocol: define new callback set_affinity
MINOR: quic: delay post handshake frames after accept
MEDIUM: quic: implement thread affinity rebinding
BUG/MINOR: quic: transform qc_set_timer() as a reentrant function
MINOR: quic: properly finalize thread rebinding
MAJOR: quic: support thread balancing on accept
MINOR: listener: remove unneeded local accept flag
BUG/MEDIUM: quic: prevent crash on Retry sending
BUG/MINOR: mux-quic: fix crash with app ops install failure
BUG/MINOR: mux-quic: properly handle STREAM frame alloc failure
BUG/MINOR: h3: fix crash on h3s alloc failure
BUG/MINOR: quic: prevent crash on qc_new_conn() failure
BUG/MINOR: quic: consume Rx datagram even on error
Aurelien DARRAGON (37):
MINOR: clock: add now_mono_time_fast() function
MINOR: clock: add now_cpu_time_fast() function
MEDIUM: hlua: reliable timeout detection
MEDIUM: hlua: introduce tune.lua.burst-timeout
CLEANUP: hlua: avoid confusion between internal timers and tick based
timers
MINOR: hlua: hook yield on known lua state
MINOR: hlua: safe coroutine.create()
CLEANUP: errors: fix obsolete function comments
CLEANUP: server: fix update_status() function comment
MINOR: server/event_hdl: add proxy_uuid to event_hdl_cb_data_server
MINOR: hlua/event_hdl: rely on proxy_uuid instead of proxy_name for
lookups
MINOR: hlua/event_hdl: expose proxy_uuid variable in server events
MINOR: hlua/event_hdl: fix return type for
hlua_event_hdl_cb_data_push_args
MINOR: server/event_hdl: prepare for upcoming refactors
BUG/MINOR: event_hdl: don't waste 1 event subtype slot
CLEANUP: event_hdl: updating obsolete comment for EVENT_HDL_CB_DATA
CLEANUP: event_hdl: fix comment typo about _sync assertion
MINOR: event_hdl: dynamically allocated event data members
MINOR: event_hdl: provide event->when for advanced handlers
MINOR: hlua/event_hdl: timestamp for events
DOC: lua: restore 80 char limitation
BUG/MINOR: server: incorrect report for tracking servers leaving drain
MINOR: server: explicitly commit state change in srv_update_status()
BUG/MINOR: server: don't miss proxy stats update on server state
transitions
BUG/MINOR: server: don't miss server stats update on server state
transitions
BUG/MINOR: server: don't use date when restoring last_change from state
file
MINOR: server: central update for server counters on state change
MINOR: server: propagate server state change to lb through single function
MINOR: server: propagate lb changes through srv_lb_propagate()
MINOR: server: change adm_st_chg_cause storage type
MINOR: server: srv_append_status refacto
MINOR: server: change srv_op_st_chg_cause storage type
CLEANUP: server: remove unused variables in srv_update_status()
CLEANUP: server: fix srv_set_{running, stopping, stopped} function comment
MINOR: server: pass adm and op cause to srv_update_status()
MEDIUM: server: split srv_update_status() in two functions
MINOR: server/event_hdl: prepare for server event data wrapper
Christopher Faulet (52):
BUG/MEDIUM: cli: Set SE_FL_EOI flag for '_getsocks' and 'quit' commands
BUG/MEDIUM: cli: Eat output data when waiting for appctx shutdown
BUG/MEDIUM: http-client: Eat output data when waiting for appctx shutdown
BUG/MEDIUM: stats: Eat output data when waiting for appctx shutdown
BUG/MEDIUM: log: Eat output data when waiting for appctx shutdown
BUG/MEDIUM: dns: Kill idle DNS sessions during stopping stage
BUG/MINOR: resolvers: Wakeup DNS idle task on stopping
BUG/MEDIUM: resolvers: Force the connect timeout for DNS resolutions
MINOR: hlua: Stop to check the SC state when executing a hlua cli command
BUG/MEDIUM: mux-h1: Report EOI when a TCP connection is upgraded to H2
BUG/MEDIUM: mux-h2: Never set SE_FL_EOS without SE_FL_EOI or SE_FL_ERROR
BUG/MINOR: stream: Fix test on SE_FL_ERROR on the wrong entity
BUG/MEDIUM: stream: Report write timeouts before testing the flags
BUG/MEDIUM: stconn: Do nothing in sc_conn_recv() when the SC needs more
room
MINOR: stream: Uninline and export sess_set_term_flags() function
MINOR: filters: Review and simplify errors handling
REGTESTS: fix the race conditions in log_uri.vtc
MINOR: channel: Forwad close to other side on abort
MINOR: stream: Introduce stream_abort() to abort on both sides in same
time
MINOR: stconn: Rename SC_FL_SHUTR_NOW in SC_FL_ABRT_WANTED
MINOR: channel/stconn: Replace channel_shutr_now() by sc_schedule_abort()
MINOR: stconn: Rename SC_FL_SHUTW_NOW in SC_FL_SHUT_WANTED
MINOR: channel/stconn: Replace channel_shutw_now() by
sc_schedule_shutdown()
MINOR: stconn: Rename SC_FL_SHUTR in SC_FL_ABRT_DONE
MINOR: channel/stconn: Replace sc_shutr() by sc_abort()
MINOR: stconn: Rename SC_FL_SHUTW in SC_FL_SHUT_DONE
MINOR: channel/stconn: Replace sc_shutw() by sc_shutdown()
MINOR: tree-wide: Replace several chn_cons() by the corresponding SC
MINOR: tree-wide: Replace several chn_prod() by the corresponding SC
BUG/MINOR: cli: Don't close when SE_FL_ERR_PENDING is set in cli analyzer
MINOR: stconn: Stop to set SE_FL_ERROR on sending path
MEDIUM: stconn: Forbid applets with more to deliver if EOI was reached
MINOR: stconn: Don't clear SE_FL_ERROR when endpoint is reset
MINOR: stconn: Add a flag to ack endpoint errors at SC level
MINOR: backend: Set SC_FL_ERROR on connection error
MINOR: stream: Set SC_FL_ERROR on channels' buffer allocation error
MINOR: tree-wide: Test SC_FL_ERROR with SE_FL_ERROR from upper layer
MEDIUM: tree-wide: Stop to set SE_FL_ERROR from upper layer
MEDIUM: backend: Stop to use SE flags to detect connection errors
MEDIUM: stream: Stop to use SE flags to detect read errors from analyzers
MEDIUM: stream: Stop to use SE flags to detect endpoint errors
MEDIUM: stconn: Rely on SC flags to handle errors instead of SE flags
BUG/MINOR: stconn: Don't set SE_FL_ERROR at the end of sc_conn_send()
BUG/MEDIUM: http-ana: Properly switch the request in tunnel mode on
upgrade
BUG/MEDIUM: log: Properly handle client aborts in syslog applet
MINOR: stconn: Add a flag to report EOS at the stream-connector level
MINOR: stconn: Propagate EOS from a mux to the attached stream-connector
MINOR: stconn: Propagate EOS from an applet to the attached
stream-connector
BUG/MINOR: http-ana: Update analyzers on both sides when switching in
TUNNEL mode
CLEANUP: backend: Remove useless debug message in assign_server()
CLEANUP: cli: Remove useless debug message in cli_io_handler()
BUG/MEDIUM: stconn: Propagate error on the SC on sending path
Frédéric Lécaille (19):
MINOR: quic: Trace fix in quic_pto_pktns() (handshaske status)
BUG/MINOR: quic: Wrong packet number space probing before confirmed
handshake
MINOR: quic: Modify qc_try_rm_hp() traces
MINOR: quic: Dump more information at proto level when building packets
MINOR: quic: Add a trace for packet with an ACK frame
MINOR: quic: Add packet loss and maximum cc window to "show quic"
BUG/MINOR: quic: Ignored less than 1ms RTTs
MINOR: quic: Add connection flags to traces
BUG/MEDIUM: quic: Code sanitization about acknowledgements requirements
BUG/MINOR: quic: Possible wrapped values used as ACK tree purging limit.
BUG/MINOR: quic: SIGFPE in quic_cubic_update()
MINOR: quic: Display the packet number space flags in traces
MINOR: quic: Remove a useless test about probing in qc_prep_pkts()
BUG/MINOR: quic: Wrong Application encryption level selection when probing
BUG/MINOR: quic: Do not use ack delay during the handshakes
BUG/MINOR: quic: Stop removing ACK ranges when building packets
MINOR: quic: Do not allocate too much ack ranges
BUG/MINOR: quic: Unchecked buffer length when building the token
BUG/MINOR: quic: Wrong Retry token generation timestamp computing
Ilya Shipitsin (7):
CI: bump "actions/checkout" to v3 for cross zoo matrix
CI: enable monthly test on Fedora Rawhide
CLEANUP: use "offsetof" where appropriate
CI: cirrus-ci: bump FreeBSD image to 13-1
REGTESTS: remove unsupported "stats bind-process" keyword
CI: extend spellchecker whitelist, add "clen" as well
CLEANUP: assorted typo fixes in the code and comments
Olivier Houchard (1):
BUG/MEDIUM: fd: don't wait for tmask to stabilize if we're not in it.
Tim Duesterhus (5):
MINOR: Make `tasklet_free()` safe to be called with `NULL`
CLEANUP: Stop checking the pointer before calling `tasklet_free()`
CLEANUP: Stop checking the pointer before calling `pool_free()`
CLEANUP: Stop checking the pointer before calling `task_free()`
CLEANUP: Stop checking the pointer before calling `ring_free()`
William Lallemand (2):
BUG/MINOR: stick_table: alert when type len has incorrect characters
MINOR: ssl: remove OpenSSL 1.0.2 mention into certificate loading error
Willy Tarreau (49):
MINOR: activity: add a line reporting the average CPU usage to "show
activity"
MINOR: thread: keep a bitmask of enabled groups in thread_set
MINOR: fd: optimize fd_claim_tgid() for use in fd_insert()
MINOR: fd: add a lock bit with the tgid
MINOR: receiver: reserve special values for "shards"
MINOR: bind-conf: support a new shards value: "by-group"
MINOR: mux-h2: make the initial window size configurable per side
MINOR: mux-h2: make the max number of concurrent streams configurable per
side
MINOR: config: add "no-alpn" support for bind lines
REGTESTS: add a new "ssl_alpn" test to test ALPN negotiation
DOC: add missing documentation for "no-alpn" on bind lines
MINOR: ssl: do not set ALPN callback with the empty string
MINOR: ssl_crtlist: dump "no-alpn" on "show crtlist" when "no-alpn" was
set
MEDIUM: config: set useful ALPN defaults for HTTPS and QUIC
BUG/MINOR: cfgparse: make sure to include openssl-compat
MINOR: quic: support migrating the listener as well
MINOR: quic_sock: index li->per_thr[] on local thread id, not global one
MINOR: listener: support another thread dispatch mode: "fair"
MINOR: receiver: add a struct shard_info to store info about each shard
MINOR: receiver: add RX_F_MUST_DUP to indicate that an rx must be duped
MEDIUM: proto: duplicate receivers marked RX_F_MUST_DUP
MINOR: proto: skip socket setup for duped FDs
MEDIUM: config: permit to start a bind on multiple groups at once
MINOR: listener: make accept_queue index atomic
MEDIUM: listener: rework thread assignment to consider all groups
MINOR: listener: use a common thr_idx from the reference listener
MINOR: listener: resync with the thread index before heavy calculations
MINOR: listener: make sure to avoid ABA updates in per-thread index
MINOR: listener: always compare the local thread as well
BUG/MINOR: cli: clarify error message about stats bind-process
BUG/MINOR: sock_inet: use SO_REUSEPORT_LB where available
BUG/MINOR: tools: check libssl and libcrypto separately
BUG/MINOR: config: fix NUMA topology detection on FreeBSD
BUILD: sock_inet: forward-declare struct receiver
BUILD: proto_tcp: export the correct names for proto_tcpv[46]
CLEANUP: protocol: move the l3_addrlen to plug a hole in proto_fam
CLEANUP: protocol: move the nb_receivers to plug a hole in protocol
REORG: listener: move the bind_conf's thread setup code to listener.c
MINOR: proxy: make proxy_type_str() recognize peers sections
MEDIUM: peers: call bind_complete_thread_setup() to finish the config
MINOR: protocol: add a flags field to store info about protocols
MINOR: protocol: move the global reuseport flag to the protocols
MINOR: listener: automatically adjust shards based on support for
SO_REUSEPORT
MINOR: protocol: add a function to check if some features are supported
MINOR: sock: add a function to check for SO_REUSEPORT support at runtime
MINOR: protocol: perform a live check for SO_REUSEPORT support
MINOR: listener: do not restrict CLI to first group anymore
MINOR: listener: add a new global tune.listener.default-shards setting
MEDIUM: listener: switch the default sharding to by-group
---