theirix commented on code in PR #13475:
URL: https://github.com/apache/datafusion/pull/13475#discussion_r1849012118
##########
datafusion/optimizer/src/push_down_filter.rs:
##########
@@ -998,7 +998,19 @@ impl OptimizerRule for PushDownFilter {
filter_predicates.len());
}
- let zip = filter_predicates.into_iter().zip(results);
+ let zip =
+ filter_predicates
+ .into_iter()
+ .zip(results)
+ .map(|(expr, res)| {
+ let filter_pushdown_type = if expr.is_volatile() {
Review Comment:
As I understand, we can ask a table source to tell if the filter is
supported even for volatile expressions.
I thought the main difference is which call is more expensive.
`supports_filters_pushdown` for different sources often provides blanket
all-supported or all-unsupported for all predicates. So we can save
`is_volatile` calls by filtering out `Unsupported` values before. In contrast,
if we check for volatility first and only then ask the source if the filters
are supported, we can waste `is_volatilty` calls. If the
`supports_filters_pushdown` is more costly, let's reverse it, of course.
I pushed a change to check the volatility after filtering out unsupported
predicates. If we prefer the inverse order, I could refactor this optimiser
quite a bit to:
- Get a subset of supported predicates
- Apply the `is_volatile` filter to it
- Build `new_scan_filters` and `new_predicate`
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]