alamb opened a new issue, #7887:
URL: https://github.com/apache/arrow-datafusion/issues/7887

   ### Is your feature request related to a problem or challenge?
   
   We now have two ways to do range / interval analysis in DataFusion. 
   
   # `Interval` based  analysis 
   The 
[`Interval`](https://docs.rs/datafusion/32.0.0/datafusion/physical_expr/intervals/struct.Interval.html)
 library is used for cardinality estimation and has 
   
   # Pruning Predicate
   The [`Pruning 
Predicate`](https://docs.rs/datafusion/32.0.0/datafusion/physical_optimizer/pruning/struct.PruningPredicate.html)
  is used to prune row groups based on min/max values.
   
   
   
   
   
   ### Describe the solution you'd like
   
   I would like one interval analysis library (probably based on `Interval`) 
   
   Having two representations is challenging because
   
   1. Interval analysis has a natural story for handle arbitrary predicates 
while `PruningPredicate` does not due to how it is is implemented as a rewrite
   2. The types of expressions handled are different (to add support for LIKE 
we would have to change both PruningPrediate and Intervals)
   3. There is no way to combine the BloomFilter support added in 
https://github.com/apache/arrow-datafusion/pull/7821 with the row groups (so it 
can't handle predicates like `col_a < 5 or col_b = <id>` if we had stats for 
`col_a` but a bloom filter for `col_b`
   4. The pruning predicate evaluation is vectorized so it would work well for 
1000s of row groups
   
   ### Describe alternatives you've considered
   
   I propose unify the range analysis on `Interval` and implement the 
`PruningPredicate` in terms of Interval. Here is an example of doing so for a 
bloom filters, and I think we could extend the pattern to PruningPredicate: 
https://github.com/alamb/arrow-datafusion/pull/14
   
   Doing so would likely require extending the interval analysis arithmetic to 
support more operators (like `IN` lists)
   
   ### Additional context
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to