alamb opened a new issue, #7887: URL: https://github.com/apache/arrow-datafusion/issues/7887
### Is your feature request related to a problem or challenge? We now have two ways to do range / interval analysis in DataFusion. # `Interval` based analysis The [`Interval`](https://docs.rs/datafusion/32.0.0/datafusion/physical_expr/intervals/struct.Interval.html) library is used for cardinality estimation and has # Pruning Predicate The [`Pruning Predicate`](https://docs.rs/datafusion/32.0.0/datafusion/physical_optimizer/pruning/struct.PruningPredicate.html) is used to prune row groups based on min/max values. ### Describe the solution you'd like I would like one interval analysis library (probably based on `Interval`) Having two representations is challenging because 1. Interval analysis has a natural story for handle arbitrary predicates while `PruningPredicate` does not due to how it is is implemented as a rewrite 2. The types of expressions handled are different (to add support for LIKE we would have to change both PruningPrediate and Intervals) 3. There is no way to combine the BloomFilter support added in https://github.com/apache/arrow-datafusion/pull/7821 with the row groups (so it can't handle predicates like `col_a < 5 or col_b = <id>` if we had stats for `col_a` but a bloom filter for `col_b` 4. The pruning predicate evaluation is vectorized so it would work well for 1000s of row groups ### Describe alternatives you've considered I propose unify the range analysis on `Interval` and implement the `PruningPredicate` in terms of Interval. Here is an example of doing so for a bloom filters, and I think we could extend the pattern to PruningPredicate: https://github.com/alamb/arrow-datafusion/pull/14 Doing so would likely require extending the interval analysis arithmetic to support more operators (like `IN` lists) ### Additional context _No response_ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
