[GitHub] [arrow-datafusion] isidentical opened a new issue, #4518: Enrich filter statistics predictions with estimated column boundaries

GitBox Mon, 05 Dec 2022 12:47:08 -0800


isidentical opened a new issue, #4518:
URL: https://github.com/apache/arrow-datafusion/issues/4518


   **Is your feature request related to a problem or challenge? Please describe 
what you are trying to do.**
   The current implementation of the filter statistics hides all the column 
statistics which makes it really hard to further cost estimators to work (e.g. 
if the parent node is a hashjoin, it needs the child's column boundaries to 
estimate its own result; otherwise it just gives up). 
   
   **Describe the solution you'd like**
   There are certain cases where we can know a particular filter's effect on 
the resulting table (e.g. `a > 25` on a `a=[0, 100]; b=[50, 60]` would mean 
`a=[25, 100] (different), b=[50, 60] (same)`). For simple (and relatively 
common) expressions like the above, we should be able to derive the new column 
boundaries for used predicates and push it down further in the statistic 
estimation chain.
   
   **Describe alternatives you've considered**
   None
   
   **Additional context**
   Related to #3929 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-datafusion] isidentical opened a new issue, #4518: Enrich filter statistics predictions with estimated column boundaries

Reply via email to