alamb opened a new issue, #5535: URL: https://github.com/apache/arrow-datafusion/issues/5535
**Is your feature request related to a problem or challenge? Please describe what you are trying to do.** At the time of writing , DataFusion has **three** partially complete range/interval analysis implementations: ## Range analysis used in cardinality estimation https://docs.rs/datafusion-physical-expr/18.0.0/datafusion_physical_expr/trait.PhysicalExpr.html#method.analyze https://docs.rs/datafusion-physical-expr/18.0.0/datafusion_physical_expr/struct.AnalysisContext.html (cc @isidentical) ## Pruning Predicate https://docs.rs/datafusion/18.0.0/datafusion/physical_optimizer/pruning/struct.PruningPredicate.html (creates an expr and then evaluates it against statistics) # Intervals https://github.com/apache/arrow-datafusion/blob/6a8c4a60114c7132de6d2fcd6d44111cf40b3df9/datafusion/physical-expr/src/intervals/mod.rs#L24 Introduced by @metesynnada and @ozankabak in https://github.com/apache/arrow-datafusion/pull/5322 The modules are all structured a differently / have different implementations but what they are computing (interval / range analysis) is basically the same and they all operate on `PhysicalExpr`s. Since there are three implementations, as we add new PhysicalExprs either the work must be triplicated or else (more likely) the implementations will keep diverging in functionality. In addition it is hard to increase the scope of coverage when the logic is in different places **Describe the solution you'd like** I would like a single implementation of range analysis used for all three uses in DataFusion. The one introduced by @metesynnada and @ozankabak in https://github.com/apache/arrow-datafusion/pull/5322 seems to have the best documentation and theoretical foundation, but the other two are more functionally complete. I don't have a preference for which approach is taken, other than that we end up with a single implementation that is easier to maintain **Describe alternatives you've considered** <!-- A clear and concise description of any alternative solutions or features you've considered. --> **Additional context** I believe @ozankabak said they may have plans in this area (https://github.com/apache/arrow-datafusion/pull/5322#issuecomment-1441990911) but I am not sure cc @mingmwang -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
