adriangb commented on PR #22343:
URL: https://github.com/apache/datafusion/pull/22343#issuecomment-4484148436

   > Perhaps this kind of reordering could be implemented as a runtime 
optimization inside `FilterExec`: for the first batch, track each conjunct's 
evaluation time and selectivity, then decide the order dynamically. One nice 
benefit of this approach is that we don't have to hardcode whether an 
expression is "expensive" or "cheap".
   
   That is exactly what https://github.com/apache/datafusion/pull/22144 does 😃. 
I think we could re-use pretty much the exact same machinery. It took a lot of 
iterations to arrive at the right metrics: you want to take into account time 
spent on compute no just selectivity, etc.
   
   Someone please correct me if I'm wrong but IIRC currently because of the 
tree structure we compute each side of a binary expression and apply the slice 
to the array, then compute the next side, etc. I wonder if an approach like 
https://github.com/apache/arrow-rs/pull/9659 might be helpful to mitigate 
overheads from non-selective masks?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to