2010YOUY01 commented on PR #22343:
URL: https://github.com/apache/datafusion/pull/22343#issuecomment-4483940881

   One challenge is that "cheap" means different things depending on where the 
predicate is evaluated:
   
   - In `FilterExec`, for in-memory evaluation, I think this is a great default 
heuristic.
   - In Parquet decoding with late materialization, things can get trickier. 
For example, given `(c1 LIKE '%foo%bar%') AND (c2 > 0) AND (c3 > 0)`, if the 
regex is very selective while the other predicates are not selective and `c2` / 
`c3` are heavily compressed, we might want to decode and evaluate regex 
conjunct first.
   
   Perhaps this kind of reordering could be implemented as a runtime 
optimization inside `FilterExec`: for the first batch, track each conjunct's 
evaluation time and selectivity, then decide the order dynamically. One nice 
benefit of this approach is that we don't have to hardcode whether an 
expression is "expensive" or "cheap".


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to