Dandandan opened a new pull request, #22009: URL: https://github.com/apache/datafusion/pull/22009
## Which issue does this PR close? - Closes #. ## Rationale for this change ClickBench q17 performs aggregation with a LIMIT over grouping keys. Without applying the limit during partial aggregation, memory grows with all discovered groups even though only the smallest keys can survive the final global limit. ## What changes are included in this PR? - Extends the limited distinct aggregation optimizer rule to unordered grouped aggregates with aggregate expressions. - Adds local group-key top-k pruning in `GroupedHashAggregateStream` for aggregate limits. - Routes unordered aggregate limits through the hash aggregate path and keeps existing ordered top-k handling separate. - Updates optimizer and ClickBench sqllogictest coverage. - Makes the dfbench and imdb benchmark binary allocator cfgs tolerate all-features by preferring mimalloc when both allocator features are enabled. ## Are these changes tested? Tested with: - `cargo fmt --all` - `cargo check -p datafusion-physical-plan -p datafusion-physical-optimizer` - `cargo test -p datafusion --test core_integration limited_distinct_aggregation -- --nocapture` - `cargo run -p datafusion-benchmarks --bin dfbench -- clickbench -q 17 -i 1 -n 4 --path datafusion/core/tests/data/clickbench_hits_10.parquet --debug` - `cargo test -p datafusion-sqllogictest --test sqllogictests clickbench.slt` Known pre-existing check issue left untouched per request: - `cargo clippy --all-targets --all-features -- -D warnings` fails in `benchmarks/benches/sql.rs` because that bench still enables both snmalloc and mimalloc global allocators under all-features. ## Are there any user-facing changes? No API changes. Query plans for unordered grouped aggregate LIMITs can now push the limit into partial aggregation, reducing memory use for queries such as ClickBench q17. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
