alamb opened a new issue, #19028: URL: https://github.com/apache/datafusion/issues/19028
### Is your feature request related to a problem or challenge? - Related to / follow on to https://github.com/apache/datafusion/issues/18860 in the context of https://github.com/apache/datafusion/pull/18868, @xudong963 is adding the ability to tell when a predicate is always `true` for a particular Parquet row group (aka it filters no rows) @crepererum noted there is another interesting potential optimization when we know the predicate is `true` for the entire row group we can skip evaluating the predicate for the row group entirely. This can improve performance as the filter can often be quite expensive itself, and it’s a no-trade off optimization, if it can be applied it’s always a win. @adriangb reported they have implemented this optimization at his company and seen substantial improvements ### Describe the solution you'd like If a predicate is determined to not filter any rows for a particular row group, don't apply it ### Describe alternatives you've considered This likely is only relevant when parquet pushdown is enabled. I think the api to evaluate parquet predicates would need some updating as well as there is no way now to evaluate predicates on only some row groups, but not others ### Additional context This came up in the context of a discussion with @xudong963 @adriangb here - https://github.com/apache/datafusion/pull/18868#issuecomment-3597988936 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
