adriangb commented on issue #11262: URL: https://github.com/apache/datafusion/issues/11262#issuecomment-4481598087
@neilconway I wonder what you were thinking of tackling? IMO the version of this that #22144 is trying to tackle necessarily needs to live external to the expression itself. It's much more tightly wound with the parquet scan itself because we aren't just deciding the order in which filters are evaluated but also where the IO happens (eager vs. late materialization). It could still make sense to have `BinaryExpr` do reordering itself. For example we are not going to split / track `id = 1 OR message ilike '%foo%'` as separate filters during scans. And there are still `FilterExec`s, join conditions, other places where binary expressions are used. I was also wondering if a "flattened" binary expression that can do a better job of re-using buffers, etc. would make sense. It seems necessary to do some sort of re-ordering sanely (otherwise you have to re-build the expression tree which would be hard once the query is executing). Another issue that appears is the "what is an expression anyway" issue. Various places remap children / rewrite the expression in ways. It's not always clear when the expression is the same (should share selectivity tracking) or not. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
