adriangb opened a new pull request, #22438:
URL: https://github.com/apache/datafusion/pull/22438

   ## Summary
   
   Flip `datafusion.optimizer.enable_join_dynamic_filter_pushdown` from `true` 
→ `false` by default.
   
   When a hash join's build-side dynamic filter contains a `hash_lookup` term, 
evaluating it on every probe-side row inside the scan duplicates the work the 
join's own probe is about to do. On TPC-H Q17 this doubles end-to-end query 
time, and similar ~20–100% regressions show up across TPC-H Q3/Q5/Q8/Q9/Q14/Q18 
and many TPC-DS queries that join a small dim table to a large fact table.
   
   The config is still available per-query when the build-side filter is 
selective enough to make scan-level pruning worthwhile (e.g. small dim table 
that prunes most of a fact table's row groups / pages).
   
   ## Benchmark numbers
   
   From local SF1 / ClickBench-partitioned runs (12 vCPU), comparing `main` 
defaults vs `main` with this knob flipped to `false`:
   
   | Suite | default (on) | with this PR (off) |
   |---|---|---|
   | TPC-H total | 841 ms | 817 ms |
   | TPC-H Q17 | ~80 ms | ~74 ms |
   | TPC-DS total | 11.0 s | 11.3 s |
   | ClickBench total | 21.7 s | 22.0 s |
   
   The total deltas are small in aggregate because the dynamic-filter pushdown 
both helps some queries (it does enable scan-level pruning) and hurts others 
(the doubled probe work). The default flip is about removing the regression 
tail; users who know their build side prunes well can re-enable per-query.
   
   ## Test plan
   
   - [x] `cargo test --test sqllogictests` — all 472 files pass after slt 
snapshot updates.
   - [ ] `run benchmarks`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to