Dandandan opened a new pull request, #21042:
URL: https://github.com/apache/datafusion/pull/21042

   ## Summary
   - Use `with_hashes` for batch hash computation via thread-local buffer, 
separating hashing from hash table ops for better vectorization/pipelining
   - Process 4 rows at a time via `chunks_exact(4)` with local dedup within 
each chunk to reduce redundant hash table operations
   - Split hash table operations into `find` + `insert_unique` phases (lighter 
than `entry` which prepares an insertion slot even on hit)
   - Extract `find_group`, `insert_new_group`, `get_or_create_null_group` 
helpers to consolidate unsafe hash table logic with SAFETY comments
   - Separate null/no-null fast paths to eliminate validity checks when no 
nulls are present
   
   ## Test plan
   - [x] `cargo test -p datafusion-physical-plan aggregat` (82 tests pass)
   - [x] `cargo clippy -p datafusion-physical-plan --all-features -- -D 
warnings` (clean)
   - [x] `cargo fmt --all` (clean)
   - [ ] Benchmark with group-by queries on primitive columns (low and high 
cardinality)
   
   🤖 Generated with [Claude Code](https://claude.com/claude-code)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to