Hi, HAProxy 2.8-dev8 was released on 2023/04/23. It added 199 new commits after version 2.8-dev7.
We're reaching the end of the planned sensitive changes and that's great, so I'm addressing my thanks to all those who went through the effort of polishing and testing their code before the last minute. We'll now engage in a stabilization period with a bit more serenity. Once we put the 49 bug fixes aside, among the visible changes in this release are the following: - Lua: the execution timeout was addressed. It did not always work for sample fetches for example, so that infinite loops were still possible. The reason was that it was relying on the internal timer that doesn't make any progress in this case, so infinite loops in sample fetches were still able to trigger the watchgod. Now a monotonic time is periodically retrieved to make sure the task is still allowed to run, otherwise it ends on failure. The maximum delay for this is currently set to one second (which is already huge but matches previous usages) and is configurable via tune.lua.burst-timeout. In addition to all this, some minor improvements to the server events API to pass the proxy's ID and a few elements needed to make the API more durable and extensible were merged. It now sounds like it should be easy enough to write action scripts in Lua that react to server up/down/add/del conditions. - channels/stream connectors: the remaining harmless changes were merged so as to clarify the internal API. The SC_FL_SHUTR flag that was ambigous since it could both indicate a close from the bottom layers or an abort from the upper ones was now split in two that are both tested. For 2.9 we'll audit their usage to figure which ones to keep at each place, but at least having them in good shape now will make troubleshooting much easier and ease backporting fixes in the future. It's interesting to note that the purpose of this refactoring was to make the internal API more logical and clearer, and that the changes needed to get there allowed us to spot several long-standing bugs, and that it already happened recently that a few loops/timeout bugs that affected 2.7 and older did not affect 2.8 anymore. Keeping fingers crossed! - QUIC: there was still a limitation with the way the incoming connection load balancing was handled, with the lowest bits of the connection ID serving to identify the thread ID. It was good enough but did not allow to properly redistribute the load to available threads. In order to address this limitation, QUIC connections now learned to migrate between threads after their handshake, so that they effectively work exactly like TCP sockets and file descriptors. As such they are now assigned to a thread upon accept() using the same mechanism that was already available for TCP connections. This results in a smoother and more controllable load distribution. And the thread ID is no longer derived directly from the connection ID but is located on the connection element itself, so that there's no longer the risk that poorly written clients cause an imbalance of the threads. Aside this, traces and debugging were improved, as usual. There is still one rare bug that Tristan is facing, that I hope will be found and fixed next week. Reaching such difficult bugs is a really good indication that the stack is getting more mature now. - "bind" lines are now fully compatible with thread groups and now support more than 64 threads. A new "shards" setting, "by-group" allows to create a new listener for each thread-group instead of having a single one for the whole process or one per thread. For sockets families (or OSes) that do not support sharding, a transparent fallback is performed which will simply dup() the existing socket so that threads from multiple groups can listen to the same socket. This also allowed to remove the hard-coded restriction to group 1 for the stats sockets, and means that when using more than one shard on a unix socket, we won't be seeing the last thread take all the traffic after having removed and replaced the predecessor's socket on the file system. Finally it was found that using the "by-group" shard setting was the best compromise in general: when running with a single group, it doesn't change anything, and when running with several groups, it will as much as possible try to create several sockets with the exact same number of file descriptors. I.e. it never costs more FDs than the default setting while significantly reducing the kernel-side locking. A test on a 24-core AMD EPYC 74F3 showed that the performance simply doubled from 112k to 214k conn/s by reducing kernel locking overhead from 71% to 55%. So that convinced me to enable "by-group" as the new default *right now*. However, the change of default value is a single commit, so if it were found to cause any problem, it's trivial to revert. Regarding this, the algorithm used to assign a thread to a multi- queue listener adopted a new variant, "fair", which can be suited for those dealing with lots of short-lived connections. It's just a round robin instead of a least-conn like variant. It results on an even smoother load distribution between threads when all requests statistically have an equivalent cost. Also, while testing this it was found that SO_REUSEPORT doesn't always work well to distribute the load on FreeBSD and that very often a single socket takes the load. FreeBSD 12 and above implement SO_REUSEPORT_LB to load-balance connections like on Linux, though is is limited to 256 listeners per port. It was enabled as it also contributes to significantly improving performance without having to deal with tricky settings. Testers are welcome, especially on non-Linux/FreeBSD/MacOS systems (since those are confirmed to work well). - H2/H3: as discussed this week, the default ALPN for HTTPS frontends is now automatically preset to "h2,http/1.1" for TCP listeners, and "h3" for QUIC listeners, so that it's no longer needed to specify alpn on the "bind" lines. Of course if specified, the value will prevail. A new "no-alpn" keyword was added to disable ALPN. The global settings for the initial window (which limits upload bitrate) and number of concurrent streams (that can cause issues on backend) were finally split to be adjustable per side. I'm thinking that we could possibly take this opportunity to significantly increase the frontend's default window size to 1 MB for example (~80 Mbps per stream at 100ms latency) to increase the default POST upload bandwidth since it will no longer affect the backend nor cause head-of-line blocking issues. Technically speaking, it will cause latency to concurrent streams from the browser uploading contents, but when uploading a large file, I strongly doubt anyone does anything over the same connection in parallel, though some users might have a different experience. Opinions welcome as usual. - various code cleanups, build fixes and CI improvements Now we still have 4-5 weeks for cleanups, fixes and doc. There are still a few low hanging fruits I'd like to get for 2.8 but with low importance: - enlarge the panic dump buffers (may use one trash per thread), because crashes happening on more than 50-60 threads are truncated. - make "show activity" yieldable because many threads and large values also result in truncated output - clean up and merge the ring locking improvements (x1.5-2). That's not good enough in my opinion but the traces are used a lot for QUIC and performance is a limiting factor there, so if we can raise the bar it will still help. - get rid of the historic now.tv_sec that encourages incorrect usage and causes bugs that take a lot of time to be detected. - fix the way output bytes are accounted for (it's done too low, and for QUIC, it also counts retransmits). - indication of bytes in flight for QUIC and probably for each mux stream so that we can log them. - renaming of various fields, variables or arguments that regularly cause confusion when analysing the code. - review various error messages or indications to make sure they're still relevant (e.g. we recently found a suggestion to use OpenSSL >= 1.0.2). The journey till there has been a little bit chaotic but I think we're now on a good track for the last mile, so that's a good thing and for now I'm rather positive on the upcoming 2.8. We'll try to emit one version per week till the release from now on, to ease testing and bug reporting. If you're interested in what is coming for 2.8, please try to find some time to give it a try. And if you can't deploy in production, at least use it to check your configs and report any anomalies or unexpected warning/error you'd face. Please find the usual URLs below : Site index : https://www.haproxy.org/ Documentation : https://docs.haproxy.org/ Wiki : https://github.com/haproxy/wiki/wiki Discourse : https://discourse.haproxy.org/ Slack channel : https://slack.haproxy.org/ Issue tracker : https://github.com/haproxy/haproxy/issues Sources : https://www.haproxy.org/download/2.8/src/ Git repository : https://git.haproxy.org/git/haproxy.git/ Git Web browsing : https://git.haproxy.org/?p=haproxy.git Changelog : https://www.haproxy.org/download/2.8/src/CHANGELOG Dataplane API : https://github.com/haproxytech/dataplaneapi/releases/latest Pending bugs : https://www.haproxy.org/l/pending-bugs Reviewed bugs : https://www.haproxy.org/l/reviewed-bugs Code reports : https://www.haproxy.org/l/code-reports Latest builds : https://www.haproxy.org/l/dev-packages Willy --- Complete changelog : Amaury Denoyelle (27): MINOR: fd: implement fd_migrate_on() to migrate on a non-local thread BUG/MINOR: task: allow to use tasklet_wakeup_after with tid -1 CLEANUP: quic: remove unused QUIC_LOCK label CLEANUP: quic: remove unused scid_node CLEANUP: quic: remove unused qc param on stateless reset token CLEANUP: quic: rename quic_connection_id vars MINOR: quic: remove uneeded tasklet_wakeup after accept MINOR: quic: adjust Rx packet type parsing MINOR: quic: adjust quic CID derive API MINOR: quic: remove TID ref from quic_conn MEDIUM: quic: use a global CID trees list MINOR: quic: remove TID encoding in CID MEDIUM: quic: handle conn bootstrap/handshake on a random thread MINOR: quic: do not proceed to accept for closing conn MINOR: protocol: define new callback set_affinity MINOR: quic: delay post handshake frames after accept MEDIUM: quic: implement thread affinity rebinding BUG/MINOR: quic: transform qc_set_timer() as a reentrant function MINOR: quic: properly finalize thread rebinding MAJOR: quic: support thread balancing on accept MINOR: listener: remove unneeded local accept flag BUG/MEDIUM: quic: prevent crash on Retry sending BUG/MINOR: mux-quic: fix crash with app ops install failure BUG/MINOR: mux-quic: properly handle STREAM frame alloc failure BUG/MINOR: h3: fix crash on h3s alloc failure BUG/MINOR: quic: prevent crash on qc_new_conn() failure BUG/MINOR: quic: consume Rx datagram even on error Aurelien DARRAGON (37): MINOR: clock: add now_mono_time_fast() function MINOR: clock: add now_cpu_time_fast() function MEDIUM: hlua: reliable timeout detection MEDIUM: hlua: introduce tune.lua.burst-timeout CLEANUP: hlua: avoid confusion between internal timers and tick based timers MINOR: hlua: hook yield on known lua state MINOR: hlua: safe coroutine.create() CLEANUP: errors: fix obsolete function comments CLEANUP: server: fix update_status() function comment MINOR: server/event_hdl: add proxy_uuid to event_hdl_cb_data_server MINOR: hlua/event_hdl: rely on proxy_uuid instead of proxy_name for lookups MINOR: hlua/event_hdl: expose proxy_uuid variable in server events MINOR: hlua/event_hdl: fix return type for hlua_event_hdl_cb_data_push_args MINOR: server/event_hdl: prepare for upcoming refactors BUG/MINOR: event_hdl: don't waste 1 event subtype slot CLEANUP: event_hdl: updating obsolete comment for EVENT_HDL_CB_DATA CLEANUP: event_hdl: fix comment typo about _sync assertion MINOR: event_hdl: dynamically allocated event data members MINOR: event_hdl: provide event->when for advanced handlers MINOR: hlua/event_hdl: timestamp for events DOC: lua: restore 80 char limitation BUG/MINOR: server: incorrect report for tracking servers leaving drain MINOR: server: explicitly commit state change in srv_update_status() BUG/MINOR: server: don't miss proxy stats update on server state transitions BUG/MINOR: server: don't miss server stats update on server state transitions BUG/MINOR: server: don't use date when restoring last_change from state file MINOR: server: central update for server counters on state change MINOR: server: propagate server state change to lb through single function MINOR: server: propagate lb changes through srv_lb_propagate() MINOR: server: change adm_st_chg_cause storage type MINOR: server: srv_append_status refacto MINOR: server: change srv_op_st_chg_cause storage type CLEANUP: server: remove unused variables in srv_update_status() CLEANUP: server: fix srv_set_{running, stopping, stopped} function comment MINOR: server: pass adm and op cause to srv_update_status() MEDIUM: server: split srv_update_status() in two functions MINOR: server/event_hdl: prepare for server event data wrapper Christopher Faulet (52): BUG/MEDIUM: cli: Set SE_FL_EOI flag for '_getsocks' and 'quit' commands BUG/MEDIUM: cli: Eat output data when waiting for appctx shutdown BUG/MEDIUM: http-client: Eat output data when waiting for appctx shutdown BUG/MEDIUM: stats: Eat output data when waiting for appctx shutdown BUG/MEDIUM: log: Eat output data when waiting for appctx shutdown BUG/MEDIUM: dns: Kill idle DNS sessions during stopping stage BUG/MINOR: resolvers: Wakeup DNS idle task on stopping BUG/MEDIUM: resolvers: Force the connect timeout for DNS resolutions MINOR: hlua: Stop to check the SC state when executing a hlua cli command BUG/MEDIUM: mux-h1: Report EOI when a TCP connection is upgraded to H2 BUG/MEDIUM: mux-h2: Never set SE_FL_EOS without SE_FL_EOI or SE_FL_ERROR BUG/MINOR: stream: Fix test on SE_FL_ERROR on the wrong entity BUG/MEDIUM: stream: Report write timeouts before testing the flags BUG/MEDIUM: stconn: Do nothing in sc_conn_recv() when the SC needs more room MINOR: stream: Uninline and export sess_set_term_flags() function MINOR: filters: Review and simplify errors handling REGTESTS: fix the race conditions in log_uri.vtc MINOR: channel: Forwad close to other side on abort MINOR: stream: Introduce stream_abort() to abort on both sides in same time MINOR: stconn: Rename SC_FL_SHUTR_NOW in SC_FL_ABRT_WANTED MINOR: channel/stconn: Replace channel_shutr_now() by sc_schedule_abort() MINOR: stconn: Rename SC_FL_SHUTW_NOW in SC_FL_SHUT_WANTED MINOR: channel/stconn: Replace channel_shutw_now() by sc_schedule_shutdown() MINOR: stconn: Rename SC_FL_SHUTR in SC_FL_ABRT_DONE MINOR: channel/stconn: Replace sc_shutr() by sc_abort() MINOR: stconn: Rename SC_FL_SHUTW in SC_FL_SHUT_DONE MINOR: channel/stconn: Replace sc_shutw() by sc_shutdown() MINOR: tree-wide: Replace several chn_cons() by the corresponding SC MINOR: tree-wide: Replace several chn_prod() by the corresponding SC BUG/MINOR: cli: Don't close when SE_FL_ERR_PENDING is set in cli analyzer MINOR: stconn: Stop to set SE_FL_ERROR on sending path MEDIUM: stconn: Forbid applets with more to deliver if EOI was reached MINOR: stconn: Don't clear SE_FL_ERROR when endpoint is reset MINOR: stconn: Add a flag to ack endpoint errors at SC level MINOR: backend: Set SC_FL_ERROR on connection error MINOR: stream: Set SC_FL_ERROR on channels' buffer allocation error MINOR: tree-wide: Test SC_FL_ERROR with SE_FL_ERROR from upper layer MEDIUM: tree-wide: Stop to set SE_FL_ERROR from upper layer MEDIUM: backend: Stop to use SE flags to detect connection errors MEDIUM: stream: Stop to use SE flags to detect read errors from analyzers MEDIUM: stream: Stop to use SE flags to detect endpoint errors MEDIUM: stconn: Rely on SC flags to handle errors instead of SE flags BUG/MINOR: stconn: Don't set SE_FL_ERROR at the end of sc_conn_send() BUG/MEDIUM: http-ana: Properly switch the request in tunnel mode on upgrade BUG/MEDIUM: log: Properly handle client aborts in syslog applet MINOR: stconn: Add a flag to report EOS at the stream-connector level MINOR: stconn: Propagate EOS from a mux to the attached stream-connector MINOR: stconn: Propagate EOS from an applet to the attached stream-connector BUG/MINOR: http-ana: Update analyzers on both sides when switching in TUNNEL mode CLEANUP: backend: Remove useless debug message in assign_server() CLEANUP: cli: Remove useless debug message in cli_io_handler() BUG/MEDIUM: stconn: Propagate error on the SC on sending path Frédéric Lécaille (19): MINOR: quic: Trace fix in quic_pto_pktns() (handshaske status) BUG/MINOR: quic: Wrong packet number space probing before confirmed handshake MINOR: quic: Modify qc_try_rm_hp() traces MINOR: quic: Dump more information at proto level when building packets MINOR: quic: Add a trace for packet with an ACK frame MINOR: quic: Add packet loss and maximum cc window to "show quic" BUG/MINOR: quic: Ignored less than 1ms RTTs MINOR: quic: Add connection flags to traces BUG/MEDIUM: quic: Code sanitization about acknowledgements requirements BUG/MINOR: quic: Possible wrapped values used as ACK tree purging limit. BUG/MINOR: quic: SIGFPE in quic_cubic_update() MINOR: quic: Display the packet number space flags in traces MINOR: quic: Remove a useless test about probing in qc_prep_pkts() BUG/MINOR: quic: Wrong Application encryption level selection when probing BUG/MINOR: quic: Do not use ack delay during the handshakes BUG/MINOR: quic: Stop removing ACK ranges when building packets MINOR: quic: Do not allocate too much ack ranges BUG/MINOR: quic: Unchecked buffer length when building the token BUG/MINOR: quic: Wrong Retry token generation timestamp computing Ilya Shipitsin (7): CI: bump "actions/checkout" to v3 for cross zoo matrix CI: enable monthly test on Fedora Rawhide CLEANUP: use "offsetof" where appropriate CI: cirrus-ci: bump FreeBSD image to 13-1 REGTESTS: remove unsupported "stats bind-process" keyword CI: extend spellchecker whitelist, add "clen" as well CLEANUP: assorted typo fixes in the code and comments Olivier Houchard (1): BUG/MEDIUM: fd: don't wait for tmask to stabilize if we're not in it. Tim Duesterhus (5): MINOR: Make `tasklet_free()` safe to be called with `NULL` CLEANUP: Stop checking the pointer before calling `tasklet_free()` CLEANUP: Stop checking the pointer before calling `pool_free()` CLEANUP: Stop checking the pointer before calling `task_free()` CLEANUP: Stop checking the pointer before calling `ring_free()` William Lallemand (2): BUG/MINOR: stick_table: alert when type len has incorrect characters MINOR: ssl: remove OpenSSL 1.0.2 mention into certificate loading error Willy Tarreau (49): MINOR: activity: add a line reporting the average CPU usage to "show activity" MINOR: thread: keep a bitmask of enabled groups in thread_set MINOR: fd: optimize fd_claim_tgid() for use in fd_insert() MINOR: fd: add a lock bit with the tgid MINOR: receiver: reserve special values for "shards" MINOR: bind-conf: support a new shards value: "by-group" MINOR: mux-h2: make the initial window size configurable per side MINOR: mux-h2: make the max number of concurrent streams configurable per side MINOR: config: add "no-alpn" support for bind lines REGTESTS: add a new "ssl_alpn" test to test ALPN negotiation DOC: add missing documentation for "no-alpn" on bind lines MINOR: ssl: do not set ALPN callback with the empty string MINOR: ssl_crtlist: dump "no-alpn" on "show crtlist" when "no-alpn" was set MEDIUM: config: set useful ALPN defaults for HTTPS and QUIC BUG/MINOR: cfgparse: make sure to include openssl-compat MINOR: quic: support migrating the listener as well MINOR: quic_sock: index li->per_thr[] on local thread id, not global one MINOR: listener: support another thread dispatch mode: "fair" MINOR: receiver: add a struct shard_info to store info about each shard MINOR: receiver: add RX_F_MUST_DUP to indicate that an rx must be duped MEDIUM: proto: duplicate receivers marked RX_F_MUST_DUP MINOR: proto: skip socket setup for duped FDs MEDIUM: config: permit to start a bind on multiple groups at once MINOR: listener: make accept_queue index atomic MEDIUM: listener: rework thread assignment to consider all groups MINOR: listener: use a common thr_idx from the reference listener MINOR: listener: resync with the thread index before heavy calculations MINOR: listener: make sure to avoid ABA updates in per-thread index MINOR: listener: always compare the local thread as well BUG/MINOR: cli: clarify error message about stats bind-process BUG/MINOR: sock_inet: use SO_REUSEPORT_LB where available BUG/MINOR: tools: check libssl and libcrypto separately BUG/MINOR: config: fix NUMA topology detection on FreeBSD BUILD: sock_inet: forward-declare struct receiver BUILD: proto_tcp: export the correct names for proto_tcpv[46] CLEANUP: protocol: move the l3_addrlen to plug a hole in proto_fam CLEANUP: protocol: move the nb_receivers to plug a hole in protocol REORG: listener: move the bind_conf's thread setup code to listener.c MINOR: proxy: make proxy_type_str() recognize peers sections MEDIUM: peers: call bind_complete_thread_setup() to finish the config MINOR: protocol: add a flags field to store info about protocols MINOR: protocol: move the global reuseport flag to the protocols MINOR: listener: automatically adjust shards based on support for SO_REUSEPORT MINOR: protocol: add a function to check if some features are supported MINOR: sock: add a function to check for SO_REUSEPORT support at runtime MINOR: protocol: perform a live check for SO_REUSEPORT support MINOR: listener: do not restrict CLI to first group anymore MINOR: listener: add a new global tune.listener.default-shards setting MEDIUM: listener: switch the default sharding to by-group ---