Hi,

HAProxy 2.9.0 was released on 2023/12/05. It added 25 new commits
after version 2.9-dev12.

Since the end of last week, a few extra fixes were merged, half of which
are also for pre-2.9 versions. The recently enabled zero-copy forwarding
was finally disabled by default for QUIC since Tristan noticed that it
triggered crashes for him. This was a good opportunity for making it
possible to configure it on a per-protocol basis. We'll fix that during
3.0-dev and evaluate the relevance of re-enabling it in future 2.9 once
fixed. Finally, some doc updates were merged, as planned (the various
actions keywords are expected to be easier to find).

Let's now have a high-level overview of the changes since 2.8. After
passing over the list of changes and trying to classify them, first I
recalled a lot of small stuff and figured that what I estimated to be
a small release definitely isn't one (we sensibly have the same number
of commits as last year at the same period, about 1050). Second, I
found that a great part of the work in this version was aimed at
scalability and performance improvements in general (which is not
surprising for a non LTS version), then ease of configuration, then a
better integration into existing setups and ecosystems. And there are
two major new features that do not fall into any such categories.

As with every version now, my coworkers at HAProxyTech have spent quite
some time reviewing the -dev announce messages and the commit logs to
provide a more in-depth review of the changes that will soon appear
here, if not already there by the time I'm typing, so please check there
for the details:

   https://www.haproxy.com/blog/announcing-haproxy-2-9


New features
------------
- the first new feature is the reverse-http mechanism. It turns out
  that our design is pretty close to a draft that was published a few
  hours after our design meeting, and for this reason we decided that
  we'll work with its authors to share our observations, and that our
  feature will remain tagged experimental while the draft evolves:

     https://datatracker.ietf.org/doc/draft-bt-httpbis-reverse-http/

  I'm not going to rehash here everything that was said about this
  feature in the 2.9-dev4 announcement, but in short, this mechanism
  makes it easy for a developer to get incoming requests from the
  internet to their laptop after having established a secure
  connection to a public gateway.  It's convenient for mobile
  application development, and for support teams and sales engineers
  who need to exchange files with customers without being allowed to
  connect their PC to the network (just use WiFi/5G and access it over
  the net). Another use case we've thought about is to allow DMZ
  services to only take inbound traffic to limit the risk of bouncing
  in case of intrusion, and the last one was a way to ease
  registration of an application server against an edge load-balancer:
  it connects and it's instantly part of the load balancing farm,
  without having to rely on external registries. More details were
  available in the dev4 announcement:

     https://www.mail-archive.com/haproxy@formilux.org/msg43924.html

  The feature remains marked experimental, which means we may adjust
  it a bit over time and possibly even backport changes. For now, we
  prefer to consider that only two identical versions of haproxy are
  supposed to work together. There are still a few limitations such as
  not being compatible with the use of thread groups on the gateway
  (it shouldn't be an issue for anyone).

- log servers: the long-criticized limited load-balancing mechanism
  for outgoing logs, with its complexity and lack of health checks is
  now a thing of the past. Using the "sample" keyword was a hack
  allowing to distribute the logs to multiple servers but was never
  meant to be a true load balancing system. Now in 2.9, the first
  bullet point of the 4 year-old issue #401 is now addressed: log
  server health checking.  This is achieved by the introduction of a
  new "log" mode for backends, which allows to declare log servers,
  which are used according to the configured load balancing algorithm
  and which are checked using the existing check methods. The backend
  is then used on the "log" line and you can get rid of the multiple
  "log" statements with their "sample" keywords. There are some
  limitations which are inherent to the unidirectional and ephemeral
  aspect of logs. For example there is no "leastconn" algorithm, and
  algos in general had to be simplified to limit the overhead per
  message (e.g. weights are ignored). A new "sticky" algorithm was
  implemented to deliver to the same server until it disappears, which
  to help collect logs in large pieces and avoids flapping servers. We
  figured that this one might be interesting to port to other backends
  (thinking about database users here), we'll see later.  Issue 401
  above is long and covers other aspects such as naming fields for
  hashing/masking and for JSON encoding, these are planned for next
  steps.


Performance improvements
------------------------
They were spread over about all areas, so I'll try to be concise:

- zero-copy forwarding: it was already possible to some extents by
  swapping buffer pointers, but on congested links it would still
  leave lots of data in full buffers, that were using a lot of memory
  and causing CPU L3 cache eviction. Here the approach is different,
  it consists in directly reading from a socket into the other end's
  output buffer, for no more than it can hold and the protocol will
  accept to send. This guarantees that any byte read from the network
  stack will be sent almost immediately and will not stay there in
  memory. Not only this manages to significantly reduce memory usage
  in some cases, but it also improves performance at the same time
  thanks to better CPU cache efficiency. On various tests with many
  clients we've seen some memory usage gains of up to 30% compared to
  2.8, and network traffic gains of up to 40%. That's about the
  completion of some work started between 2.4 and 2.5 that we're
  pretty proud of because it required a huge amount of changes over
  the years to reach this point. We have identified an issue with QUIC
  so it is temporarily disabled there (i.e. we won't save memory
  there). Some changes to also make the cache benefit from it are
  already in queue for 3.0.

- thread scalability: we've arranged a significant reduction of the
  locking contention on many-core machines thanks to a generalized use
  of exponential backoff (up to ~15% perf increase on a 80-core
  machine just for this). The memory pools were the next ones in perf
  reports due to some counters, especially with QUIC which stresses
  them more than other parts due to the fact that it works at the
  datagram level, so they were reworked to be arranged in shards which
  eliminated this contention. ACL lookups were done under a lock, even
  for empty ones, and we discovered that plenty of users actually have
  various block lists that are empty and waiting to be filled at any
  moment, and that avoiding to take the lock when the list is empty
  brings nice improvements for them. Similarly, the log-forwarding
  code used to live under a big lock that used to make it scale quite
  poorly, and was significantly reduced. SSL on the backend side
  stores one session per thread, and till now they were not shared, so
  upon restart, the more thread you had, the more handshakes were
  necessary to access the servers.  This was particularly visible with
  health checks where SSL health checks would consume a lot of CPU
  calculating many handshakes. Now the sessions are shared so that a
  thread can reuse a session already set up for another thread, and in
  average only one handshake will be performed to a server after a
  restart. The stick-tables lookup and peer updates didn't like each
  other well apparently and intensive updates of synchronized stick
  tables wouldn't scale well on threads. After a few weeks spent on
  this, the performance on a 80-core machine was multiplied by 15!
  Finally, the HTTP cache also significantly reduced its locking
  scope. More threads can read from and write into the cache in
  parallel thanks to careful refining, more atomic operations and a
  level of sharding of the lookup trees. More updates are still in
  queue for the cache.

- general scalability: runtime acl/map updates would still require to
  find and remove what are called the "references" (the text form of
  an entry).  For memory saving reasons it was decided long ago that
  this would only be indexed using a linked list, long before it was
  possible to update them at runtime. Some users with very large maps
  and set-map() actions were really hitting a wall with updates going
  down to a few tens per second sometimes, so this was changed for a
  binary tree that now maintains almost constant performance, and
  which, thanks to careful fields rearrangements finally does not
  consume more memory. Again, more updates in sight on this part in
  the future to further reduce the per-item memory footprint. QUIC now
  releases most of a connection's memory after it enters the closed
  state. What HAProxy is doing with QUIC connection really ressembles
  what a kernel does with TCP TIME_WAIT ones. This will save a lot of
  memory for sites mainly running over QUIC (several gigabytes per
  million connection).


Flexibility
-----------
- a few capabilities are now supported on Linux to avoid leaving a
  process running as root for limited reasons. Actually this was made
  necessary for QUIC where we bind to a local port for each new
  incoming connection, and if the port is 443, the bind() operation
  could fail and fall back to the default single-socket mode. The new
  cap_net_bind_service capability is currently supported for this, as
  well as the cap_net_raw that can be used for a few rare cases such
  as binding to interfaces or to a foreign address. These ones are set
  using "setcap" in the global section.

- most of the remaining log-format tags (e.g. %Tt) which didn't have
  an equivalent sample fetch function were converted to have one. Some
  users indeed want to perform operations between them and that was
  extremely difficult. Timers, status codes, condition flags etc now
  exist. Others also now have multiple variants, such as the HTTP
  status where it's possible to differentiate the one received from
  the server, from the one sent to the client. Some converters were
  also improved to support variables in addition to numeric fields
  (e.g. "bytes()"), and a few more time manipulation functions were
  added to support milli/micro/nano seconds. It's also possible to
  list cookie names found in a request or a response, and a new
  "acl()" sample fetch function evaluates all of the designated ACLs
  at once (convenient to create combo ACLs).

- a few minor improvements also add flexibility, such as being able to
  set the QUIC socket binding mode per bind line and not just for the
  whole process, setting client handshake timeouts, support for also
  specifying TLS signature algorithms on the server side, support for
  Origin in the Vary cache header, or ability to preseve environment
  variables for external checks.


Integration with other components
---------------------------------
- QUIC: a limited compatibility layer allowing to use OpenSSL despite
  its lack of QUIC support was implemented and backported to 2.8.4. It
  does not support 0-RTT and I think everyone agrees that we should
  not have to hack around this. But for a while, users didn't have the
  choice but to use OpenSSL, so at least these ones can have some QUIC
  support now.  The best solution of course, is to get rid of OpenSSL
  which is now the last SSL stack not supporting QUIC, and with
  horrible performance since 3.x.

- Speaking of getting rid of OpenSSL, new serious contenders are now
  available, that anyone not afraid of making their own packages, who
  is concerned about performance so as not to pay 10 vCPUs when only
  2 should be needed, and who doesn't depend on OpenSSL-centric
  features, should really evaluate. Don't get me wrong, we don't have
  much feedback yet on these options, so there may still be some rough
  edges, but sometimes energy savings and cost cutting can deserve
  living on the bleeding edge.  The first option, wolfSSL, continued
  to make progress and their version 5.6.4 integrates pretty well with
  HAProxy now. Please don't use any older version. The second option,
  AWS-LC, is AWS's libcrypto. It's between BoringSSL and OpenSSL, is
  more similar to OpenSSL than wolfSSL but is lacking certain algos
  used by QUIC. It is particularly fast on ARM machines such as
  Graviton3 instances (AWS c7gn etc), where it can even be 15% faster
  than wolfSSL on RSA! The support status of such alternatives is
  regularly updated on the HAProxy wiki page at:

    https://github.com/haproxy/wiki/wiki/SSL-Libraries-Support-Status

  I'd like us to find enough time to write an in-depth article comparing
  all these alternatives on the grounds of features and performance, as
  I feel like it's really needed, even if it's a big work. Stay tuned.

- it is now possible to extract arbitrary fields from incoming PROXY
  protocol headers, and to set arbitrary ones in outgoing headers.
  This can make haproxy transparently pass certain application
  specific info between two components, or even fill some in-house
  security fields for example.


Possibly breaking changes
-------------------------
- fragments in URL are now rejected by default as they are invalid in HTTP
  and can cause trouble to some backend components. In the very unlikely
  case any application would rely on them, this can be reverted using the
  usual "option accept-invalid-http-request".

- some blatantly invalid CPU bindings and/or thread counts are detected
  and issue a warning. This concerns cases where more threads than CPUs
  are configured, and when multiple threads are bound to a smaller set
  of CPUs. I was myself caught on this by having entered invalid cpu-map
  directives in a test config, so I wouldn't be surprised if others face
  it as well ;-)


Misc
----
- we now crossed the 200 reg-tests symbolic barrier, totalizing more than
  3500 expect rules! I think it has become visible over the last versions
  that we need to emit less after a .0 release, showing some progress on
  the quality front.

- more debugging improvements, trying to expose more info in crash dumps
  and even suggest configuration hints when facing certain well-known
  problematic situation.

- plenty of other small goodies there's not enough room here to enumerate,
  please check the blog article as most if not all of them will be listed
  there.

As usual I've already created the 3.0-dev0 release to start the new
development cycle. And for once I forgot to change the status in the
INSTALL and version files to mention it's no longer in development, so
I'll do that soon. Sorry about this, I noticed it too late, still too
many steps to care about.

And I couldn't finish this announce without addressing a huge Thank You
to all those who contributed by testing, reporting issues, helping others,
participating to discussions and to bug chasing, reviewing issues and
code, and of course, contributing new features!

Let's not change a good tradition, I'm almost done updating the site, and
will watch for Tim's email telling me where I messed up :-)

Please find the usual URLs below :
   Site index       : https://www.haproxy.org/
   Documentation    : https://docs.haproxy.org/
   Wiki             : https://github.com/haproxy/wiki/wiki
   Discourse        : https://discourse.haproxy.org/
   Slack channel    : https://slack.haproxy.org/
   Issue tracker    : https://github.com/haproxy/haproxy/issues
   Sources          : https://www.haproxy.org/download/2.9/src/
   Git repository   : https://git.haproxy.org/git/haproxy-2.9.git/
   Git Web browsing : https://git.haproxy.org/?p=haproxy-2.9.git
   Changelog        : https://www.haproxy.org/download/2.9/src/CHANGELOG
   Dataplane API    : 
https://github.com/haproxytech/dataplaneapi/releases/latest
   Pending bugs     : https://www.haproxy.org/l/pending-bugs
   Reviewed bugs    : https://www.haproxy.org/l/reviewed-bugs
   Code reports     : https://www.haproxy.org/l/code-reports
   Latest builds    : https://www.haproxy.org/l/dev-packages

Willy
---
Complete changelog :
Aurelien DARRAGON (5):
      BUG/MINOR: cfgparse-listen: fix warning being reported as an alert
      DOC: config: add matrix entry for "max-session-srv-conns"
      DOC: config: fix monitor-fail typo
      DOC: config: add context hint for proxy keywords
      BUG/MINOR: server/event_hdl: properly handle AF_UNSPEC for INETADDR event

Christopher Faulet (8):
      DEBUG: stream: Report lra/fsb values for front end back SC in stream dump
      MINOR: global: Use a dedicated bitfield to customize zero-copy 
fast-forwarding
      MINOR: mux-pt: Add global option to enable/disable zero-copy forwarding
      MINOR: mux-h1: Add global option to enable/disable zero-copy forwarding
      MINOR: mux-h2: Add global option to enable/disable zero-copy forwarding
      MINOR: mux-quic: Add global option to enable/disable zero-copy forwarding
      MINOR: mux-quic: Disable zero-copy forwarding for send by default
      BUG/MEDIUM: peers: fix partial message decoding

Tim Duesterhus (4):
      DOC: config: add missing colon to "bytes_out" sample fetch keyword (2)
      REGTESTS: sample: Test the behavior of consecutive delimiters for the 
field converter
      BUG/MINOR: sample: Make the `word` converter compatible with `-m found`
      DOC: Clarify the differences between field() and word()

William Lallemand (1):
      MINOR: acme.sh: don't use '*' in the filename for wildcard domain

Willy Tarreau (7):
      BUILD: http_htx: silence uninitialized warning on some gcc versions
      DOC: config: update the reminder on the HTTP model and add some 
terminology
      DOC: config: add a few more differences between HTTP/1 and 2+
      DOC: config: clarify session vs stream
      DOC: config: fix typo abandonned -> abandoned
      DOC: management: fix two latest typos (optionally, exception)
      DOC: management: update stream vs session

---

Reply via email to