2010YOUY01 commented on PR #22343: URL: https://github.com/apache/datafusion/pull/22343#issuecomment-4483940881
One challenge is that "cheap" means different things depending on where the predicate is evaluated: - In `FilterExec`, for in-memory evaluation, I think this is a great default heuristic. - In Parquet decoding with late materialization, things can get trickier. For example, given `(c1 LIKE '%foo%bar%') AND (c2 > 0) AND (c3 > 0)`, if the regex is very selective while the other predicates are not selective and `c2` / `c3` are heavily compressed, we might want to decode and evaluate regex conjunct first. Perhaps this kind of reordering could be implemented as a runtime optimization inside `FilterExec`: for the first batch, track each conjunct's evaluation time and selectivity, then decide the order dynamically. One nice benefit of this approach is that we don't have to hardcode whether an expression is "expensive" or "cheap". -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
