Hi, HAProxy 2.9.0 was released on 2023/12/05. It added 25 new commits after version 2.9-dev12.
Since the end of last week, a few extra fixes were merged, half of which are also for pre-2.9 versions. The recently enabled zero-copy forwarding was finally disabled by default for QUIC since Tristan noticed that it triggered crashes for him. This was a good opportunity for making it possible to configure it on a per-protocol basis. We'll fix that during 3.0-dev and evaluate the relevance of re-enabling it in future 2.9 once fixed. Finally, some doc updates were merged, as planned (the various actions keywords are expected to be easier to find). Let's now have a high-level overview of the changes since 2.8. After passing over the list of changes and trying to classify them, first I recalled a lot of small stuff and figured that what I estimated to be a small release definitely isn't one (we sensibly have the same number of commits as last year at the same period, about 1050). Second, I found that a great part of the work in this version was aimed at scalability and performance improvements in general (which is not surprising for a non LTS version), then ease of configuration, then a better integration into existing setups and ecosystems. And there are two major new features that do not fall into any such categories. As with every version now, my coworkers at HAProxyTech have spent quite some time reviewing the -dev announce messages and the commit logs to provide a more in-depth review of the changes that will soon appear here, if not already there by the time I'm typing, so please check there for the details: https://www.haproxy.com/blog/announcing-haproxy-2-9 New features ------------ - the first new feature is the reverse-http mechanism. It turns out that our design is pretty close to a draft that was published a few hours after our design meeting, and for this reason we decided that we'll work with its authors to share our observations, and that our feature will remain tagged experimental while the draft evolves: https://datatracker.ietf.org/doc/draft-bt-httpbis-reverse-http/ I'm not going to rehash here everything that was said about this feature in the 2.9-dev4 announcement, but in short, this mechanism makes it easy for a developer to get incoming requests from the internet to their laptop after having established a secure connection to a public gateway. It's convenient for mobile application development, and for support teams and sales engineers who need to exchange files with customers without being allowed to connect their PC to the network (just use WiFi/5G and access it over the net). Another use case we've thought about is to allow DMZ services to only take inbound traffic to limit the risk of bouncing in case of intrusion, and the last one was a way to ease registration of an application server against an edge load-balancer: it connects and it's instantly part of the load balancing farm, without having to rely on external registries. More details were available in the dev4 announcement: https://www.mail-archive.com/haproxy@formilux.org/msg43924.html The feature remains marked experimental, which means we may adjust it a bit over time and possibly even backport changes. For now, we prefer to consider that only two identical versions of haproxy are supposed to work together. There are still a few limitations such as not being compatible with the use of thread groups on the gateway (it shouldn't be an issue for anyone). - log servers: the long-criticized limited load-balancing mechanism for outgoing logs, with its complexity and lack of health checks is now a thing of the past. Using the "sample" keyword was a hack allowing to distribute the logs to multiple servers but was never meant to be a true load balancing system. Now in 2.9, the first bullet point of the 4 year-old issue #401 is now addressed: log server health checking. This is achieved by the introduction of a new "log" mode for backends, which allows to declare log servers, which are used according to the configured load balancing algorithm and which are checked using the existing check methods. The backend is then used on the "log" line and you can get rid of the multiple "log" statements with their "sample" keywords. There are some limitations which are inherent to the unidirectional and ephemeral aspect of logs. For example there is no "leastconn" algorithm, and algos in general had to be simplified to limit the overhead per message (e.g. weights are ignored). A new "sticky" algorithm was implemented to deliver to the same server until it disappears, which to help collect logs in large pieces and avoids flapping servers. We figured that this one might be interesting to port to other backends (thinking about database users here), we'll see later. Issue 401 above is long and covers other aspects such as naming fields for hashing/masking and for JSON encoding, these are planned for next steps. Performance improvements ------------------------ They were spread over about all areas, so I'll try to be concise: - zero-copy forwarding: it was already possible to some extents by swapping buffer pointers, but on congested links it would still leave lots of data in full buffers, that were using a lot of memory and causing CPU L3 cache eviction. Here the approach is different, it consists in directly reading from a socket into the other end's output buffer, for no more than it can hold and the protocol will accept to send. This guarantees that any byte read from the network stack will be sent almost immediately and will not stay there in memory. Not only this manages to significantly reduce memory usage in some cases, but it also improves performance at the same time thanks to better CPU cache efficiency. On various tests with many clients we've seen some memory usage gains of up to 30% compared to 2.8, and network traffic gains of up to 40%. That's about the completion of some work started between 2.4 and 2.5 that we're pretty proud of because it required a huge amount of changes over the years to reach this point. We have identified an issue with QUIC so it is temporarily disabled there (i.e. we won't save memory there). Some changes to also make the cache benefit from it are already in queue for 3.0. - thread scalability: we've arranged a significant reduction of the locking contention on many-core machines thanks to a generalized use of exponential backoff (up to ~15% perf increase on a 80-core machine just for this). The memory pools were the next ones in perf reports due to some counters, especially with QUIC which stresses them more than other parts due to the fact that it works at the datagram level, so they were reworked to be arranged in shards which eliminated this contention. ACL lookups were done under a lock, even for empty ones, and we discovered that plenty of users actually have various block lists that are empty and waiting to be filled at any moment, and that avoiding to take the lock when the list is empty brings nice improvements for them. Similarly, the log-forwarding code used to live under a big lock that used to make it scale quite poorly, and was significantly reduced. SSL on the backend side stores one session per thread, and till now they were not shared, so upon restart, the more thread you had, the more handshakes were necessary to access the servers. This was particularly visible with health checks where SSL health checks would consume a lot of CPU calculating many handshakes. Now the sessions are shared so that a thread can reuse a session already set up for another thread, and in average only one handshake will be performed to a server after a restart. The stick-tables lookup and peer updates didn't like each other well apparently and intensive updates of synchronized stick tables wouldn't scale well on threads. After a few weeks spent on this, the performance on a 80-core machine was multiplied by 15! Finally, the HTTP cache also significantly reduced its locking scope. More threads can read from and write into the cache in parallel thanks to careful refining, more atomic operations and a level of sharding of the lookup trees. More updates are still in queue for the cache. - general scalability: runtime acl/map updates would still require to find and remove what are called the "references" (the text form of an entry). For memory saving reasons it was decided long ago that this would only be indexed using a linked list, long before it was possible to update them at runtime. Some users with very large maps and set-map() actions were really hitting a wall with updates going down to a few tens per second sometimes, so this was changed for a binary tree that now maintains almost constant performance, and which, thanks to careful fields rearrangements finally does not consume more memory. Again, more updates in sight on this part in the future to further reduce the per-item memory footprint. QUIC now releases most of a connection's memory after it enters the closed state. What HAProxy is doing with QUIC connection really ressembles what a kernel does with TCP TIME_WAIT ones. This will save a lot of memory for sites mainly running over QUIC (several gigabytes per million connection). Flexibility ----------- - a few capabilities are now supported on Linux to avoid leaving a process running as root for limited reasons. Actually this was made necessary for QUIC where we bind to a local port for each new incoming connection, and if the port is 443, the bind() operation could fail and fall back to the default single-socket mode. The new cap_net_bind_service capability is currently supported for this, as well as the cap_net_raw that can be used for a few rare cases such as binding to interfaces or to a foreign address. These ones are set using "setcap" in the global section. - most of the remaining log-format tags (e.g. %Tt) which didn't have an equivalent sample fetch function were converted to have one. Some users indeed want to perform operations between them and that was extremely difficult. Timers, status codes, condition flags etc now exist. Others also now have multiple variants, such as the HTTP status where it's possible to differentiate the one received from the server, from the one sent to the client. Some converters were also improved to support variables in addition to numeric fields (e.g. "bytes()"), and a few more time manipulation functions were added to support milli/micro/nano seconds. It's also possible to list cookie names found in a request or a response, and a new "acl()" sample fetch function evaluates all of the designated ACLs at once (convenient to create combo ACLs). - a few minor improvements also add flexibility, such as being able to set the QUIC socket binding mode per bind line and not just for the whole process, setting client handshake timeouts, support for also specifying TLS signature algorithms on the server side, support for Origin in the Vary cache header, or ability to preseve environment variables for external checks. Integration with other components --------------------------------- - QUIC: a limited compatibility layer allowing to use OpenSSL despite its lack of QUIC support was implemented and backported to 2.8.4. It does not support 0-RTT and I think everyone agrees that we should not have to hack around this. But for a while, users didn't have the choice but to use OpenSSL, so at least these ones can have some QUIC support now. The best solution of course, is to get rid of OpenSSL which is now the last SSL stack not supporting QUIC, and with horrible performance since 3.x. - Speaking of getting rid of OpenSSL, new serious contenders are now available, that anyone not afraid of making their own packages, who is concerned about performance so as not to pay 10 vCPUs when only 2 should be needed, and who doesn't depend on OpenSSL-centric features, should really evaluate. Don't get me wrong, we don't have much feedback yet on these options, so there may still be some rough edges, but sometimes energy savings and cost cutting can deserve living on the bleeding edge. The first option, wolfSSL, continued to make progress and their version 5.6.4 integrates pretty well with HAProxy now. Please don't use any older version. The second option, AWS-LC, is AWS's libcrypto. It's between BoringSSL and OpenSSL, is more similar to OpenSSL than wolfSSL but is lacking certain algos used by QUIC. It is particularly fast on ARM machines such as Graviton3 instances (AWS c7gn etc), where it can even be 15% faster than wolfSSL on RSA! The support status of such alternatives is regularly updated on the HAProxy wiki page at: https://github.com/haproxy/wiki/wiki/SSL-Libraries-Support-Status I'd like us to find enough time to write an in-depth article comparing all these alternatives on the grounds of features and performance, as I feel like it's really needed, even if it's a big work. Stay tuned. - it is now possible to extract arbitrary fields from incoming PROXY protocol headers, and to set arbitrary ones in outgoing headers. This can make haproxy transparently pass certain application specific info between two components, or even fill some in-house security fields for example. Possibly breaking changes ------------------------- - fragments in URL are now rejected by default as they are invalid in HTTP and can cause trouble to some backend components. In the very unlikely case any application would rely on them, this can be reverted using the usual "option accept-invalid-http-request". - some blatantly invalid CPU bindings and/or thread counts are detected and issue a warning. This concerns cases where more threads than CPUs are configured, and when multiple threads are bound to a smaller set of CPUs. I was myself caught on this by having entered invalid cpu-map directives in a test config, so I wouldn't be surprised if others face it as well ;-) Misc ---- - we now crossed the 200 reg-tests symbolic barrier, totalizing more than 3500 expect rules! I think it has become visible over the last versions that we need to emit less after a .0 release, showing some progress on the quality front. - more debugging improvements, trying to expose more info in crash dumps and even suggest configuration hints when facing certain well-known problematic situation. - plenty of other small goodies there's not enough room here to enumerate, please check the blog article as most if not all of them will be listed there. As usual I've already created the 3.0-dev0 release to start the new development cycle. And for once I forgot to change the status in the INSTALL and version files to mention it's no longer in development, so I'll do that soon. Sorry about this, I noticed it too late, still too many steps to care about. And I couldn't finish this announce without addressing a huge Thank You to all those who contributed by testing, reporting issues, helping others, participating to discussions and to bug chasing, reviewing issues and code, and of course, contributing new features! Let's not change a good tradition, I'm almost done updating the site, and will watch for Tim's email telling me where I messed up :-) Please find the usual URLs below : Site index : https://www.haproxy.org/ Documentation : https://docs.haproxy.org/ Wiki : https://github.com/haproxy/wiki/wiki Discourse : https://discourse.haproxy.org/ Slack channel : https://slack.haproxy.org/ Issue tracker : https://github.com/haproxy/haproxy/issues Sources : https://www.haproxy.org/download/2.9/src/ Git repository : https://git.haproxy.org/git/haproxy-2.9.git/ Git Web browsing : https://git.haproxy.org/?p=haproxy-2.9.git Changelog : https://www.haproxy.org/download/2.9/src/CHANGELOG Dataplane API : https://github.com/haproxytech/dataplaneapi/releases/latest Pending bugs : https://www.haproxy.org/l/pending-bugs Reviewed bugs : https://www.haproxy.org/l/reviewed-bugs Code reports : https://www.haproxy.org/l/code-reports Latest builds : https://www.haproxy.org/l/dev-packages Willy --- Complete changelog : Aurelien DARRAGON (5): BUG/MINOR: cfgparse-listen: fix warning being reported as an alert DOC: config: add matrix entry for "max-session-srv-conns" DOC: config: fix monitor-fail typo DOC: config: add context hint for proxy keywords BUG/MINOR: server/event_hdl: properly handle AF_UNSPEC for INETADDR event Christopher Faulet (8): DEBUG: stream: Report lra/fsb values for front end back SC in stream dump MINOR: global: Use a dedicated bitfield to customize zero-copy fast-forwarding MINOR: mux-pt: Add global option to enable/disable zero-copy forwarding MINOR: mux-h1: Add global option to enable/disable zero-copy forwarding MINOR: mux-h2: Add global option to enable/disable zero-copy forwarding MINOR: mux-quic: Add global option to enable/disable zero-copy forwarding MINOR: mux-quic: Disable zero-copy forwarding for send by default BUG/MEDIUM: peers: fix partial message decoding Tim Duesterhus (4): DOC: config: add missing colon to "bytes_out" sample fetch keyword (2) REGTESTS: sample: Test the behavior of consecutive delimiters for the field converter BUG/MINOR: sample: Make the `word` converter compatible with `-m found` DOC: Clarify the differences between field() and word() William Lallemand (1): MINOR: acme.sh: don't use '*' in the filename for wildcard domain Willy Tarreau (7): BUILD: http_htx: silence uninitialized warning on some gcc versions DOC: config: update the reminder on the HTTP model and add some terminology DOC: config: add a few more differences between HTTP/1 and 2+ DOC: config: clarify session vs stream DOC: config: fix typo abandonned -> abandoned DOC: management: fix two latest typos (optionally, exception) DOC: management: update stream vs session ---