phongn opened a new pull request, #13166: URL: https://github.com/apache/trafficserver/pull/13166
## Summary Wire simdutf in as an opt-in SIMD backend for `ats_base64_encode` and `ats_base64_decode` (also exposed via the `TSBase64Encode` / `TSBase64Decode` plugin API). Roughly an order-of-magnitude speedup on medium and larger inputs on AVX2 hardware; behavior-preserving for every in-tree caller. ## How it's wired - `auto_option(SIMDUTF FEATURE_VAR TS_USE_SIMDUTF PACKAGE_DEPENDS simdutf)` — default `AUTO`, same shape as `HWLOC` / `UNWIND`. Builds without simdutf installed are unaffected and fall back to the scalar path. - `src/tscore/ink_base64.cc` becomes a thin hybrid wrapper: scalar helpers in an anonymous namespace (always compiled), simdutf used only when `inBufferSize` exceeds an empirically chosen per-direction threshold. Tiny-input cases (e.g. the 8-byte `SnowflakeID` encode) stay on the scalar path to avoid simdutf's per-call dispatch overhead. - `include/tscore/ink_config.h.cmake.in` gains `#cmakedefine01 TS_USE_SIMDUTF`. ## Performance (Xeon E5-2683 v4, AVX2) | Op | Size | Scalar only | simdutf only | **Hybrid (this PR)** | |---|---:|---:|---:|---:| | encode | 8 B | 15.7 ns | 25.5 ns | **16.8 ns** | | encode | 32 B | 45.8 | 29.5 | **30.7** | | encode | 200 B | 256 | 47.9 | **50.2** | | encode | 4096 B | 5128 | 525 | **534** | | decode | 12 B b64 | 21.8 | 66.5 | **22.5** | | decode | 44 B b64 | 70.8 | 84.3 | **68.4** | | decode | 268 B b64 | 385 | 94.1 | **113** | | decode | 5464 B b64 | 7295 | 583 | **572** | ## Behavior Both paths preserve the existing public contract: - **Encode**: standard `+/=` alphabet, no line breaks, trailing NUL written at `outBuffer[length]`. - **Decode**: accepts both `+/` and `-_` in the same input, tolerates missing padding, truncates silently on invalid characters, trailing NUL written. - In-place decode (used by `plugins/experimental/magick`) is preserved. **One behavioral delta when the simdutf path is taken**: simdutf silently skips ASCII whitespace (space, tab, CR, LF, FF) inside the input, whereas the scalar path stops at the first whitespace byte. None of the in-tree callers feed whitespace to these functions; flagged in the file's header comment. ## Test plan - [x] Catch2 microbench `tools/benchmark/benchmark_ink_base64` covers both correctness and performance. Locks the byte-exact fixture from `InkAPITest.cc::SDK_API_ENCODING` as a regression test. - [x] 46 correctness assertions pass with `ENABLE_SIMDUTF=AUTO` (hybrid) and `ENABLE_SIMDUTF=OFF` (scalar-only). - [x] `cmake --build build -t format` clean. - [ ] Jenkins CI green. - [ ] Manual smoke of `traffic_server` against a workload exercising OCSP stapling and the S3 `origin_server_auth` plugin (encode hot paths). ## Notes for reviewers - Thresholds (`BASE64_ENCODE_SIMD_THRESHOLD=24`, `BASE64_DECODE_SIMD_THRESHOLD=48`) were chosen from the benchmark data and documented in the file. The crossover shifts on different cores but the thresholds are robust within an order of magnitude. - The scalar decoder contains a latent out-of-bounds read when `inBufferSize` is 1 or 2 (the existing `inBuffer[-2]` access in the trailing-bytes adjustment). I preserved this rather than smuggle in a behavior change. Worth a follow-up issue but out of scope here. 🤖 Generated with [Claude Code](https://claude.com/claude-code) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
