phongn opened a new pull request, #13166:
URL: https://github.com/apache/trafficserver/pull/13166

   ## Summary
   
   Wire simdutf in as an opt-in SIMD backend for `ats_base64_encode` and 
`ats_base64_decode` (also exposed via the `TSBase64Encode` / `TSBase64Decode` 
plugin API). Roughly an order-of-magnitude speedup on medium and larger inputs 
on AVX2 hardware; behavior-preserving for every in-tree caller.
   
   ## How it's wired
   
   - `auto_option(SIMDUTF FEATURE_VAR TS_USE_SIMDUTF PACKAGE_DEPENDS simdutf)` 
— default `AUTO`, same shape as `HWLOC` / `UNWIND`. Builds without simdutf 
installed are unaffected and fall back to the scalar path.
   - `src/tscore/ink_base64.cc` becomes a thin hybrid wrapper: scalar helpers 
in an anonymous namespace (always compiled), simdutf used only when 
`inBufferSize` exceeds an empirically chosen per-direction threshold. 
Tiny-input cases (e.g. the 8-byte `SnowflakeID` encode) stay on the scalar path 
to avoid simdutf's per-call dispatch overhead.
   - `include/tscore/ink_config.h.cmake.in` gains `#cmakedefine01 
TS_USE_SIMDUTF`.
   
   ## Performance (Xeon E5-2683 v4, AVX2)
   
   | Op | Size | Scalar only | simdutf only | **Hybrid (this PR)** |
   |---|---:|---:|---:|---:|
   | encode | 8 B | 15.7 ns | 25.5 ns | **16.8 ns** |
   | encode | 32 B | 45.8 | 29.5 | **30.7** |
   | encode | 200 B | 256 | 47.9 | **50.2** |
   | encode | 4096 B | 5128 | 525 | **534** |
   | decode | 12 B b64 | 21.8 | 66.5 | **22.5** |
   | decode | 44 B b64 | 70.8 | 84.3 | **68.4** |
   | decode | 268 B b64 | 385 | 94.1 | **113** |
   | decode | 5464 B b64 | 7295 | 583 | **572** |
   
   ## Behavior
   
   Both paths preserve the existing public contract:
   
   - **Encode**: standard `+/=` alphabet, no line breaks, trailing NUL written 
at `outBuffer[length]`.
   - **Decode**: accepts both `+/` and `-_` in the same input, tolerates 
missing padding, truncates silently on invalid characters, trailing NUL written.
   - In-place decode (used by `plugins/experimental/magick`) is preserved.
   
   **One behavioral delta when the simdutf path is taken**: simdutf silently 
skips ASCII whitespace (space, tab, CR, LF, FF) inside the input, whereas the 
scalar path stops at the first whitespace byte. None of the in-tree callers 
feed whitespace to these functions; flagged in the file's header comment.
   
   ## Test plan
   
   - [x] Catch2 microbench `tools/benchmark/benchmark_ink_base64` covers both 
correctness and performance. Locks the byte-exact fixture from 
`InkAPITest.cc::SDK_API_ENCODING` as a regression test.
   - [x] 46 correctness assertions pass with `ENABLE_SIMDUTF=AUTO` (hybrid) and 
`ENABLE_SIMDUTF=OFF` (scalar-only).
   - [x] `cmake --build build -t format` clean.
   - [ ] Jenkins CI green.
   - [ ] Manual smoke of `traffic_server` against a workload exercising OCSP 
stapling and the S3 `origin_server_auth` plugin (encode hot paths).
   
   ## Notes for reviewers
   
   - Thresholds (`BASE64_ENCODE_SIMD_THRESHOLD=24`, 
`BASE64_DECODE_SIMD_THRESHOLD=48`) were chosen from the benchmark data and 
documented in the file. The crossover shifts on different cores but the 
thresholds are robust within an order of magnitude.
   - The scalar decoder contains a latent out-of-bounds read when 
`inBufferSize` is 1 or 2 (the existing `inBuffer[-2]` access in the 
trailing-bytes adjustment). I preserved this rather than smuggle in a behavior 
change. Worth a follow-up issue but out of scope here.
   
   🤖 Generated with [Claude Code](https://claude.com/claude-code)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to