alamb opened a new issue, #22883: URL: https://github.com/apache/datafusion/issues/22883
## Summary Decide *where* each filter conjunct runs (pushed into the Parquet decoder as a row filter, applied post-scan in memory, or dropped) by measuring its selectivity and throughput at runtime, instead of fixing the placement statically at plan time. ## Context Today, with `pushdown_filters = true`, DataFusion pushes all predicates into the Parquet reader as row filters and orders them with the static `reorder_filters` heuristic, which can't react to data skew and pays an extra IO round-trip per filter column even when a filter prunes almost nothing. Because row filters change the IO pattern (read filter columns → mask → read projection), the right placement depends on each filter's *measured* selectivity, which is only known at runtime. This issue tracks an adaptive cost model that tracks per-filter effectiveness across files/row-groups and continuously promotes, demotes, or drops conjuncts. ## Related PRs - #22144 — [Experiment] Adaptive filter pushdown (umbrella experiment) - #22234 — `OptionalFilterPhysicalExpr` wrapper + proto (mark a filter as droppable) - #22235 — Per-conjunct pruning statistics for `PruningPredicate` - #22236 — `SelectivityTracker` adaptive filter cost model - #22237 — Adaptive filter pushdown for the parquet scan (integration) - #21752 — Adaptive filter scheduling for Parquet scans (prior full PR) - #20363 — (Test) Advanced adaptive filter selectivity evaluation - #19639 — feat: adaptive filter selectivity tracking for Parquet row filters (closed prototype) - apache/arrow-rs#9659 — complementary compute-only filter evaluation optimizations ## Related issues - #15512 — [Epic] Dynamic filtering related items - #21207 — [DISCUSSION] Future of Dynamic Filters Sync - #3463 — Enable parquet `filter_pushdown` by default -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
