Dandandan opened a new pull request, #21348: URL: https://github.com/apache/datafusion/pull/21348
## Which issue does this PR close? N/A - performance optimization ## Rationale for this change Profiling `SELECT COUNT(DISTINCT "SearchPhrase") FROM hits` (ClickBench) showed `ArrowBytesViewMap::insert_if_new_inner` as a hot spot, with `_platform_memcmp` at 5% and `append_value` at 7% of CPU time. ## What changes are included in this PR? Three optimizations for the BytesView hash map hot path: 1. **Direct value bytes access**: Replace `values.value(i).as_ref()` (which goes through `GenericByteViewArray::value()` accessor — bounds check, view decode, buffer lookup) with direct pointer arithmetic on `input_views` + `input_buffers`. This avoids the accessor overhead on every hash table probe for >12 byte strings. 2. **Skip append for inline strings**: For strings ≤12 bytes, the input view is self-contained (length + data encoded in the u128). Instead of decoding to `&[u8]` and re-encoding via `append_value` → `make_view`, push the input view directly. This avoids a decode-encode round trip for the most common case (empty/short strings). 3. **Simplify `make_payload_fn`**: Change signature from `FnMut(Option<&[u8]>) -> V` to `FnMut() -> V` since no caller uses the value bytes parameter. This eliminates unnecessary value decoding on the insert path. ## Are these changes tested? Existing tests pass. Test updated to match simplified `make_payload_fn` signature. ## Are there any user-facing changes? `ArrowBytesViewMap::insert_if_new` has a changed `make_payload_fn` signature (breaking API change for downstream users of this internal API). 🤖 Generated with [Claude Code](https://claude.com/claude-code) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
