Dandandan opened a new pull request, #22603: URL: https://github.com/apache/datafusion/pull/22603
## Which issue does this PR close? N/A. Follow-up performance cleanup found while profiling ClickBench TopK queries. ## Rationale for this change `TopK::insert_batch` evaluated sort expressions before checking its dynamic filter. When the dynamic filter rejected the whole batch, this still allocated sort-key arrays and row-conversion scratch that would never be used. Scalar dynamic filters were also expanded into full boolean arrays before the all-true/all-false check. ## What changes are included in this PR? - Evaluate the TopK dynamic filter before building sort keys. - Fast-path scalar boolean dynamic filters so scalar true does not allocate a boolean array, and scalar false/null returns before sort-key evaluation. - Keep existing array-filter behavior for partial matches, including null-mask preparation and filter predicate reuse. - Add a unit test for the scalar-false dynamic-filter path. ## Are these changes tested? - `cargo fmt --all` - `cargo test -p datafusion-physical-plan topk -- --nocapture` - `cargo clippy --all-targets --all-features -- -D warnings` - `cargo run --profile release-nonlto --bin dfbench -- clickbench -q 21 -i 1 -n 8` - `cargo run --profile release-nonlto --bin dfbench -- clickbench -q 22 -i 1 -n 8` - `cargo run --profile release-nonlto --bin dfbench -- clickbench -q 23 -i 1 -n 8` - `cargo run --profile release-nonlto --bin dfbench -- clickbench -q 33 -i 1 -n 8` - `cargo run --profile release-nonlto --bin dfbench -- clickbench -q 34 -i 1 -n 8` - `cargo run --profile release-nonlto --bin mem_profile -- --bench-profile release-nonlto clickbench --query 23 -i 1 -n 8` Local non-LTO smoke results: q21 1179.87 ms, q22 2206.03 ms, q23 5718.53 ms on repeat, q33 2413.82 ms on repeat, q34 2259.08 ms. q23 mem_profile reported 5753.62 ms, peak RSS 2.2 GB, peak commit 2.3 GB. ## Are there any user-facing changes? No. This is an execution-time allocation/performance cleanup. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
