adriangb commented on PR #22343: URL: https://github.com/apache/datafusion/pull/22343#issuecomment-4484148436
> Perhaps this kind of reordering could be implemented as a runtime optimization inside `FilterExec`: for the first batch, track each conjunct's evaluation time and selectivity, then decide the order dynamically. One nice benefit of this approach is that we don't have to hardcode whether an expression is "expensive" or "cheap". That is exactly what https://github.com/apache/datafusion/pull/22144 does 😃. I think we could re-use pretty much the exact same machinery. It took a lot of iterations to arrive at the right metrics: you want to take into account time spent on compute no just selectivity, etc. Someone please correct me if I'm wrong but IIRC currently because of the tree structure we compute each side of a binary expression and apply the slice to the array, then compute the next side, etc. I wonder if an approach like https://github.com/apache/arrow-rs/pull/9659 might be helpful to mitigate overheads from non-selective masks? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
