Re: [I] [EPIC] Fix performance regressions when enabling parquet filter pushdown (late materialization) [datafusion]

via GitHub Mon, 09 Mar 2026 11:42:40 -0700


Dandandan commented on issue #20324:
URL: https://github.com/apache/datafusion/issues/20324#issuecomment-4025951528


   > > [@alamb](https://github.com/alamb) 
[@adriangb](https://github.com/adriangb) Curious to know your thoughts 
regarding [#20417](https://github.com/apache/datafusion/pull/20417)
   > 
   > My feeling is it's a special case of the general idea in 
[#20363](https://github.com/apache/datafusion/pull/20363) and would prefer a 
system based on metrics and not heuristics (hopefully we can find a metric that 
encodes this case without special casing it). My current feeling is to focus on 
[#20481](https://github.com/apache/datafusion/pull/20481) which fixes 
parallelism, data skew and gives us smaller units to adapt on for 
[#20363](https://github.com/apache/datafusion/pull/20363) or similar. That said 
I would understand the view of "let's just merge the fix/improvement" that 
works and generalize later.
   
   IMO we should •also• have some rules for row-filters to avoid fixed 
row-filters that will **never** reduce IO (at least when having no page index, 
perhaps we should add this as extra check on the PR @darmie ?)
   Also when the remaining columns to read are really tiny, IO / decode savings 
are likely to be smaller than the extra overhead.
   
   examples being:
   `select col from t where col = 1` <- not useful, can just as well apply the 
filter later
   `select col, col2 from t where col = 1 AND col = 2` <- only one of them is 
useful, the last one can not 
   `select col, col2 from t where (col = 1 OR col = 2)` <- not useful as it 
requires reading both columns
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [I] [EPIC] Fix performance regressions when enabling parquet filter pushdown (late materialization) [datafusion]

Reply via email to