Hi,

HAProxy 2.2-dev10 was released on 2020/06/19. It added 62 new commits
after version 2.2-dev9.

Yes, I know what you think: "oh no, yet another release candidate". It
turns out that the past week was more productive in spotting bugs than
it was in getting them fixed. Worse, we've spent a lot of time with
William Dauchy trying to understand where the high CPU usage he observes
on 2.2 compared to 2.1 comes from, and while experimenting on many
different scenarios, this evening I spotted a severe forwarding performance
regression that I absolutely want to understand before the release. I'm
seeing a drop from 244 Gbps to 88 Gbps on the loopback on my laptop using
2 cores! And it was even 67 Gbps a few hours ago before I merged a fix.

For sure, many will say "but who really needs that level of performance?".
My point isn't the level of performance but understanding what's happening
under the hood. We could release with that if we found it's a natural
outcome of a bad design choice on something we overlooked and that we'll
fix it later. I'm just not willing to release with this thing unexplained.
And I suspect it could be totally related to William's regression (at least
I hope, it will mean we've got a single bug to fix).

Aside this pain point, things are getting better and could have warranted
a release, so next week will be mainly focused on this.

While working on this bug, a few issues were discovered and fixed:
  - splicing didn't work anymore by default because the global "maxpipes"
    value now defaulted to zero instead of 10% of maxconn.

  - some I/O events could be reported in storms when dealing with large
    run queues, further adding to the difficulty to flush the traffic.

  - some forwarded data could be sent using a few incomplete TCP segments
    due to the loss of MSG_MORE down the send() chain. The only effect was
    the performance drop above (from 88 to 67 Gbps).

  - the scheduler was suboptimal when waking up tasklets that themselves
    wake other tasks, it was requiring two rounds of the polling loop in
    this case, causing higher than needed latencies and a double of the
    number of calls to epoll_wait().

  - I tried to address some of the extra work imposed on file descriptors
    by experimenting with edge-triggered epoll_wait(). I figured that we
    apparently had everything needed since the new connection architecure
    stabilized in 2.1, and that it was easy to get it to work for
    connections. Given that I had to pass the patch around for testing and
    I observed a performance gain of up to 14% on a test, I thought it
    could interest some other users to test it as well. So I merged it
    with an experimental status, it can be enabled using a global option.

A few other changes were brought based on previously pending bug reports:
  - configuration errors will now stop after reporting ~20 fatal errors.
    This was upsetting oss-fuzz which used to get one error per line of
    junk, and honestly we've all seen this one when feeding haproxy by
    accident with the executable instead of a config file, it can be large.

  - the line parser used by the configuration parser was partially
    rewritten to avoid using memmove(), which can cause large amounts of
    CPU to be used if tens of thousands of variables, character escapes or
    quotes are present on a line. Don't get nervous if you have a very
    large config, the new one remains faster even in the easier case. I
    was extremely careful and tested it on hundreds of config files and
    found no regressions. There is a small impact to this change, which
    is that invalid character sequences used to be reported verbatim on
    the output. This wasn't easy anymore (input line not cut anymore) and
    instead we now have a dump of the input line with a caret ('^') under
    the issue. In the end I find this easier to understand in context.

  - there was an old issue about the watchdog triggering when reloading
    huge maps. The patch proposed by then had enough time to be tested
    and was merged.

  - the command line finally supports escaping spaces using a backslash,
    thanks to Yves Lafon who sent a well thought patch a few days before
    the release, and to William who could adjust the master CLI code to
    match it. What's great is that Yves noticed that contrary to what I
    believed, backslashes were already unescaped and lost on the CLI, so
    they already constituted a usable character to escape spaces without
    breaking existing setups. All those who want to adjust their user-agent
    maps will be happy. Note that the dump format didn't change, it's still
    the same format as the input file.

  - there was yet another ssl-min/max-ver stuff discussed recently but I
    didn't follow the thread. I'm seeing that it's fixed and that's what
    matters.

  - for systemd users, we've just merged the patch from Ryan O'Hara which
    changes the network dependency to start haproxy in the unit file. It
    will now wait for an online network. This is so that those who use DNS
    names where addresses are expected don't have startup failures at boot.
    If this change is causing you any trouble, please report it here so
    that it can be discussed and addressed.

Some late but reasonable improvements:
  - addition of a new "localpeer" global option to name the local peer.
    This does the same as -L on the command line except that it doesn't
    require to adjust the command line and is compatible with a reload
    of the worker process without touching the master.

  - Tim provided some cleaner exit paths to free expressions used on
    set-var actions so that valgrind is happy even with "haproxy -c".

  - Since 2.1 and the removal of the legacy mode in favor of HTX, there's
    no more reason for closing the client connection after a server error.
    So the default responses for the internal status codes were adjusted
    to remove their "connection: close" whenever that made sense (i.e.
    always except 400 and 408 IIRC).

And the usual dose of build cleanups, doc cleanups, fixes and VTC.

I really hope that we can release next week. Not because it's urgent, but
because if we can't it will mean a few of us will experience yet another
very painful debugging week. And given that aside that performance
regression everything is much cleaner and better, I guess the release
could happen anytime once the problem is explained.

Please find the usual URLs below :
   Site index       : http://www.haproxy.org/
   Discourse        : http://discourse.haproxy.org/
   Slack channel    : https://slack.haproxy.org/
   Issue tracker    : https://github.com/haproxy/haproxy/issues
   Sources          : http://www.haproxy.org/download/2.2/src/
   Git repository   : http://git.haproxy.org/git/haproxy.git/
   Git Web browsing : http://git.haproxy.org/?p=haproxy.git
   Changelog        : http://www.haproxy.org/download/2.2/src/CHANGELOG
   Cyril's HTML doc : http://cbonte.github.io/haproxy-dconv/

Willy
---
Complete changelog :
Dragan Dosen (2):
      MINOR: peers: do not use localpeer as an array anymore
      MEDIUM: peers: add the "localpeer" global option

Ilya Shipitsin (1):
      CI: travis-ci: use "-O1" for clang builds

Olivier Houchard (5):
      BUILD: Fix build by including haproxy/global.h
      MINOR: fd: Fix a typo in a coment.
      BUG/MEDIUM: fd: Don't fd_stop_recv() a fd we don't own.
      BUG/MEDIUM: fd: Call fd_stop_recv() when we just got a fd.
      MINOR: mux_h1: Set H1_F_CO_MSG_MORE if we know we have more to send.

Peter Gervai (2):
      DOC: configuration: Unindent non-code sentences in the protobuf example
      DOC: configuration: http-check send was missing from matrix

Ryan O'Hara (1):
      BUG/MINOR: systemd: Wait for network to be online

Tim Duesterhus (8):
      BUILD: Remove nowarn for warnings that do not trigger
      BUILD: Re-enable -Wimplicit-fallthrough
      BUG/MEDIUM: checks: Fix off-by-one in allocation of SMTP greeting cmd
      MINOR: haproxy: Add void deinit_and_exit(int)
      MINOR: haproxy: Make use of deinit_and_exit() for clean exits
      BUG/MINOR: haproxy: Free rule->arg.vars.expr during deinit_act_rules
      BUG/MAJOR: vars: Fix bogus free() during deinit() for http-request rules
      BUG/MINOR: cfgparse: Add missing fatal++ in PARSE_ERR_HEX case

William Lallemand (5):
      BUG/MINOR: ssl: fix ssl-{min,max}-ver with openssl < 1.1.0
      BUG/MINOR: mworker/cli: fix the escaping in the master CLI
      BUG/MINOR: mworker/cli: fix semicolon escaping in master CLI
      REGTEST: http-rules: test spaces in ACLs
      REGTEST: http-rules: test spaces in ACLs with master CLI

Willy Tarreau (37):
      BUILD: include: add sys/types before netinet/tcp.h
      BUG/MEDIUM: log: don't hold the log lock during writev() on a file 
descriptor
      BUG/MEDIUM: pattern: fix thread safety of pattern matching
      BUILD: thread: add parenthesis around values of locking macros
      BUILD: proto_uxst: shut up yet another gcc's absurd warning
      BUILD: compression: make gcc 10 happy with free_zlib()
      BUILD: atomic: add string.h for memcpy() on ARM64
      BUG/MINOR: http: make smp_fetch_body() report that the contents may change
      BUG/MINOR: tcp-rules: tcp-response must check the buffer's fullness
      BUILD: haproxy: mark deinit_and_exit() as noreturn
      BUG/MEDIUM: ebtree: use a byte-per-byte memcmp() to compare memory blocks
      MINOR: tools: add a new configurable line parse, parse_line()
      BUG/MEDIUM: cfgparse: use parse_line() to expand/unquote/unescape config 
lines
      BUG/MEDIUM: cfgparse: stop after a reasonable amount of fatal error
      MINOR: http: do not close connections anymore after internal responses
      BUG/MINOR: spoe: add missing key length check before checking key names
      MINOR: version: put the compiler version output into version.c not 
haproxy.c
      MINOR: compiler: always define __has_feature()
      MINOR: version: report the presence of the compiler's address sanitizer
      BUG/MAJOR: connection: always disable ready events once reported
      CLEANUP: activity: remove unused counter fd_lock
      DOC: fd: make it clear that some fields ordering must absolutely be 
respected
      MINOR: activity: report the number of times poll() reports I/O
      MINOR: activity: rename confusing poll_* fields in the output
      MINOR: activity: group the per-loop counters at the top
      MINOR: activity: rename the "stream" field to "stream_calls"
      MEDIUM: fd: refine the fd_takeover() migration lock
      MINOR: fd: slightly optimize the fd_takeover double-CAS loop
      MINOR: fd: factorize the fd_takeover() exit path to make it safer
      MEDIUM: fd: add experimental support for edge-triggered polling
      CONTRIB: debug: add the missing flags CO_FL_SAFE_LIST and CO_FL_IDLE_LIST
      MINOR: haproxy: process signals before runnable tasks
      MEDIUM: tasks: clean up the front side of the wait queue in 
wake_expired_tasks()
      MEDIUM: tasks: also process late wakeups in process_runnable_tasks()
      BUG/MAJOR: init: properly compute the default global.maxpipes value
      MEDIUM: map: make the "clear map" operation yield
      BUG/MEDIUM: stream-int: fix loss of CO_SFL_MSG_MORE flag in forwarding

Yves Lafon (1):
      BUG/MINOR: cli: allow space escaping on the CLI

---

Reply via email to