pixitha opened a new pull request, #13173:
URL: https://github.com/apache/trafficserver/pull/13173
## Problem
`EsiParser::_findOpeningTag` uses a hand-rolled two-state-machine byte loop
to scan response bodies for `<esi:` and `<!--esi` opening tags. On a CDN
edge server profiled under production load, this function appeared as a
top-5 CPU leaf, ahead of OpenSSL EC math and zlib.
The root cause is that the loop is unconditionally O(n) at ~2 ns/byte — it
checks every byte even when the response body contains no ESI tags at all,
which is the common case on a CDN edge.
There is also a correctness issue in the original: the KMP-failure comment
in the source notes that the state machine mishandles sequences like
`<e<esi:` — the first `<e` advances a partial-match counter but then
fails, and the recovery doesn't rewind to the second `<`. The new
implementation avoids this entirely.
## Solution
Replace the loop with `memchr` + `memcmp`:
1. `memchr` locates the next `<` — delegates to the platform's optimized
libc implementation (e.g. `__memchr_avx2` on glibc x86-64), which
processes 128 bytes per iteration in an AVX2 unrolled loop.
2. `memcmp` verifies the candidate anchor (`<esi:` or `<!--esi`).
3. On no-match, advance past the `<` and repeat.
Partial-match (chunk boundary) paths are preserved: when the buffer ends
mid-prefix, the function returns `PARTIAL_MATCH` so the caller can
accumulate more data before deciding.
The only interface change is `#include <cstring>` — all call sites, return
types, and semantics are identical.
## Performance
> **Note: synthetic microbenchmark.** Benchmark runs each scenario over a
> 256 KiB body for 1.0 s. Real response bodies are shorter and vary in
> tag density; treat these as directional numbers.
Measured on **E5-2683 v4 Broadwell, 2.10 GHz, RHEL 8, glibc 2.28, gcc -O2**:
| Scenario | Baseline GB/s | memchr GB/s | Speedup |
|---|---:|---:|---:|
| text-only (no `<`) | 0.503 | 36.871 | **73×** |
| typical HTML (~1.5% `<`, no ESI) | 0.475 | 19.241 | **40×** |
| html-sparse | 0.470 | 18.896 | 40× |
| html-dense | 0.437 | 14.853 | 34× |
| `<!--esi` comment nodes | 0.431 | 18.362 | 43× |
| pathological (5% bare `<`) | 0.290 | 1.512 | **5×** |
The pathological case is the worst case for this approach — every false `<`
triggers a `memcmp`. Real CDN response bodies fall in the typical HTML range.
## Testing
- **Unit tests**: 870 assertions pass in `parser_test`. Four new `SECTION`
blocks cover boundary conditions specific to this implementation:
exact-length prefix at chunk end for both `<esi:` and `<!--esi`,
`<!--esi` without required trailing whitespace, and multiple false `<`
anchors before a valid tag.
- **Autests**: `esi`, `esi_304`, and `esi_nested_include` all pass (built
inside `ci.trafficserver.apache.org/ats/fedora:42`, CMake preset
`ci-fedora-autest`).
- **Differential correctness check**: 10 million random inputs were run
through both the old and new implementations — 0 mismatches across all
`MATCH_TYPE` returns, positions, and `is_html_comment_node` flags.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]