adriangb opened a new pull request, #22438: URL: https://github.com/apache/datafusion/pull/22438
## Summary Flip `datafusion.optimizer.enable_join_dynamic_filter_pushdown` from `true` → `false` by default. When a hash join's build-side dynamic filter contains a `hash_lookup` term, evaluating it on every probe-side row inside the scan duplicates the work the join's own probe is about to do. On TPC-H Q17 this doubles end-to-end query time, and similar ~20–100% regressions show up across TPC-H Q3/Q5/Q8/Q9/Q14/Q18 and many TPC-DS queries that join a small dim table to a large fact table. The config is still available per-query when the build-side filter is selective enough to make scan-level pruning worthwhile (e.g. small dim table that prunes most of a fact table's row groups / pages). ## Benchmark numbers From local SF1 / ClickBench-partitioned runs (12 vCPU), comparing `main` defaults vs `main` with this knob flipped to `false`: | Suite | default (on) | with this PR (off) | |---|---|---| | TPC-H total | 841 ms | 817 ms | | TPC-H Q17 | ~80 ms | ~74 ms | | TPC-DS total | 11.0 s | 11.3 s | | ClickBench total | 21.7 s | 22.0 s | The total deltas are small in aggregate because the dynamic-filter pushdown both helps some queries (it does enable scan-level pruning) and hurts others (the doubled probe work). The default flip is about removing the regression tail; users who know their build side prunes well can re-enable per-query. ## Test plan - [x] `cargo test --test sqllogictests` — all 472 files pass after slt snapshot updates. - [ ] `run benchmarks` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
