alamb opened a new issue, #22883:
URL: https://github.com/apache/datafusion/issues/22883

   ## Summary
   
   Decide *where* each filter conjunct runs (pushed into the Parquet decoder as 
a row filter, applied post-scan in memory, or dropped) by measuring its 
selectivity and throughput at runtime, instead of fixing the placement 
statically at plan time.
   
   ## Context
   
   Today, with `pushdown_filters = true`, DataFusion pushes all predicates into 
the Parquet reader as row filters and orders them with the static 
`reorder_filters` heuristic, which can't react to data skew and pays an extra 
IO round-trip per filter column even when a filter prunes almost nothing. 
Because row filters change the IO pattern (read filter columns → mask → read 
projection), the right placement depends on each filter's *measured* 
selectivity, which is only known at runtime. This issue tracks an adaptive cost 
model that tracks per-filter effectiveness across files/row-groups and 
continuously promotes, demotes, or drops conjuncts.
   
   ## Related PRs
   
   - #22144 — [Experiment] Adaptive filter pushdown (umbrella experiment)
   - #22234 — `OptionalFilterPhysicalExpr` wrapper + proto (mark a filter as 
droppable)
   - #22235 — Per-conjunct pruning statistics for `PruningPredicate`
   - #22236 — `SelectivityTracker` adaptive filter cost model
   - #22237 — Adaptive filter pushdown for the parquet scan (integration)
   - #21752 — Adaptive filter scheduling for Parquet scans (prior full PR)
   - #20363 — (Test) Advanced adaptive filter selectivity evaluation
   - #19639 — feat: adaptive filter selectivity tracking for Parquet row 
filters (closed prototype)
   - apache/arrow-rs#9659 — complementary compute-only filter evaluation 
optimizations
   
   ## Related issues
   
   - #15512 — [Epic] Dynamic filtering related items
   - #21207 — [DISCUSSION] Future of Dynamic Filters Sync
   - #3463 — Enable parquet `filter_pushdown` by default
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to