[PR] esi: replace _findOpeningTag with memchr [trafficserver]

via GitHub Mon, 18 May 2026 15:08:27 -0700


pixitha opened a new pull request, #13173:
URL: https://github.com/apache/trafficserver/pull/13173


   ## Problem
   
   `EsiParser::_findOpeningTag` uses a hand-rolled two-state-machine byte loop
   to scan response bodies for `<esi:` and `<!--esi` opening tags. On a CDN
   edge server profiled under production load, this function appeared as a
   top-5 CPU leaf, ahead of OpenSSL EC math and zlib.
   
   The root cause is that the loop is unconditionally O(n) at ~2 ns/byte — it
   checks every byte even when the response body contains no ESI tags at all,
   which is the common case on a CDN edge.
   
   There is also a correctness issue in the original: the KMP-failure comment
   in the source notes that the state machine mishandles sequences like
   `<e<esi:` — the first `<e` advances a partial-match counter but then
   fails, and the recovery doesn't rewind to the second `<`. The new
   implementation avoids this entirely.
   
   ## Solution
   
   Replace the loop with `memchr` + `memcmp`:
   
   1. `memchr` locates the next `<` — delegates to the platform's optimized
      libc implementation (e.g. `__memchr_avx2` on glibc x86-64), which
      processes 128 bytes per iteration in an AVX2 unrolled loop.
   2. `memcmp` verifies the candidate anchor (`<esi:` or `<!--esi`).
   3. On no-match, advance past the `<` and repeat.
   
   Partial-match (chunk boundary) paths are preserved: when the buffer ends
   mid-prefix, the function returns `PARTIAL_MATCH` so the caller can
   accumulate more data before deciding.
   
   The only interface change is `#include <cstring>` — all call sites, return
   types, and semantics are identical.
   
   ## Performance
   
   > **Note: synthetic microbenchmark.** Benchmark runs each scenario over a
   > 256 KiB body for 1.0 s. Real response bodies are shorter and vary in
   > tag density; treat these as directional numbers.
   
   Measured on **E5-2683 v4 Broadwell, 2.10 GHz, RHEL 8, glibc 2.28, gcc -O2**:
   
   | Scenario | Baseline GB/s | memchr GB/s | Speedup |
   |---|---:|---:|---:|
   | text-only (no `<`) | 0.503 | 36.871 | **73×** |
   | typical HTML (~1.5% `<`, no ESI) | 0.475 | 19.241 | **40×** |
   | html-sparse | 0.470 | 18.896 | 40× |
   | html-dense | 0.437 | 14.853 | 34× |
   | `<!--esi` comment nodes | 0.431 | 18.362 | 43× |
   | pathological (5% bare `<`) | 0.290 | 1.512 | **5×** |
   
   The pathological case is the worst case for this approach — every false `<`
   triggers a `memcmp`. Real CDN response bodies fall in the typical HTML range.
   
   ## Testing
   
   - **Unit tests**: 870 assertions pass in `parser_test`. Four new `SECTION`
     blocks cover boundary conditions specific to this implementation:
     exact-length prefix at chunk end for both `<esi:` and `<!--esi`,
     `<!--esi` without required trailing whitespace, and multiple false `<`
     anchors before a valid tag.
   - **Autests**: `esi`, `esi_304`, and `esi_nested_include` all pass (built
     inside `ci.trafficserver.apache.org/ats/fedora:42`, CMake preset
     `ci-fedora-autest`).
   - **Differential correctness check**: 10 million random inputs were run
     through both the old and new implementations — 0 mismatches across all
     `MATCH_TYPE` returns, positions, and `is_html_comment_node` flags.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[PR] esi: replace _findOpeningTag with memchr [trafficserver]

Reply via email to