alamb opened a new issue, #5535:
URL: https://github.com/apache/arrow-datafusion/issues/5535

   **Is your feature request related to a problem or challenge? Please describe 
what you are trying to do.**
   
   At the time of writing , DataFusion has **three** partially complete 
range/interval analysis implementations:
   
   ## Range analysis used in cardinality estimation 
   
https://docs.rs/datafusion-physical-expr/18.0.0/datafusion_physical_expr/trait.PhysicalExpr.html#method.analyze
   
https://docs.rs/datafusion-physical-expr/18.0.0/datafusion_physical_expr/struct.AnalysisContext.html
   (cc @isidentical)
   
   ## Pruning Predicate
   
https://docs.rs/datafusion/18.0.0/datafusion/physical_optimizer/pruning/struct.PruningPredicate.html
 
   (creates an expr and then evaluates it against statistics)
   
   # Intervals
   
https://github.com/apache/arrow-datafusion/blob/6a8c4a60114c7132de6d2fcd6d44111cf40b3df9/datafusion/physical-expr/src/intervals/mod.rs#L24
   Introduced by @metesynnada  and @ozankabak  in 
https://github.com/apache/arrow-datafusion/pull/5322
   
   
   The modules are all structured a differently / have different 
implementations but what they are computing (interval / range analysis) is 
basically the same and they all operate on `PhysicalExpr`s. 
   
   Since there are three implementations, as we add new PhysicalExprs either 
the work must be triplicated or else (more likely) the implementations will 
keep diverging in functionality. 
   
   In addition it is hard to increase the scope of coverage when the logic is 
in different places
   
   **Describe the solution you'd like**
   I would like a single implementation of range analysis used for all three 
uses in DataFusion. The one introduced by @metesynnada  and @ozankabak in 
https://github.com/apache/arrow-datafusion/pull/5322 seems to have the best 
documentation and theoretical foundation, but the other two are more 
functionally complete.
   
   I don't have a preference for which approach is taken, other than that we 
end up with a single implementation that is easier to maintain
   
   
   **Describe alternatives you've considered**
   <!--
   A clear and concise description of any alternative solutions or features 
you've considered.
   -->
   
   **Additional context**
   
   I believe @ozankabak  said they may have plans in this area 
(https://github.com/apache/arrow-datafusion/pull/5322#issuecomment-1441990911) 
but I am not sure 
   
   cc @mingmwang 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to