Re: [I] Enable parquet filter pushdown (`filter_pushdown`) by default [datafusion]

via GitHub Sun, 04 Jan 2026 09:28:07 -0800


adriangb commented on issue #3463:
URL: https://github.com/apache/datafusion/issues/3463#issuecomment-3708272050


   > The new parquet pushdown sort of does this IIUC, but at the physical 
execution level - i.e. after the IO strategy is somewhat baked in
   
   AFAIK the only thing along these lines we have is the re-ordering of filters 
based on the sum of column sizes for the columns referenced by each filter. 
Filter pushdown is all or nothing, there is no per-filter selectivity analysis. 
And I don't know how we could do it at plan time, statistics aren't generally 
enough to estimate filter selectivity (unless there is an overhaul in the 
statistics system, which is [might just 
happen](https://github.com/apache/datafusion/issues/19487)).
   
   What I am proposing is to do that dynamically, at runtime. After each filter 
for each `RecordBatch` is evaluated we re-order them and possibly toss the ones 
with poor selectivity back into the scan phase. This would require some API 
changes on the arrow-rs side e.g. to track selectivity metrics but I don't 
think arrow-rs has to coordinate the cross-scan metrics, etc.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [I] Enable parquet filter pushdown (`filter_pushdown`) by default [datafusion]

Reply via email to