alamb opened a new pull request, #8437:
URL: https://github.com/apache/arrow-datafusion/pull/8437

   ## Which issue does this PR close?
   
   Part of https://github.com/apache/arrow-datafusion/issues/8376 (see the POC 
PR  https://github.com/apache/arrow-datafusion/pull/8397 for how this all fits 
together)
   
   Broken out for easier PR review. I will update the bloom filter code to 
actually use this as a follow on PR
   
   ## Rationale for this change
   
   I am generalizing pruning a set of files/row_groups/containers based on 
information known before looking at the data (aka "pruning predicates") to 
support bloom filters and other similar structures which can tell if a given 
value is present/present in a given container (see 
https://github.com/apache/arrow-datafusion/issues/8376).
   
   Part of this analysis requires identifying if a given predicate ensures that 
a given column must take a a literal value (aka a constant) or set of values. 
Given this is an important building block and non trivial analysis, I wanted to 
make it a first class concept in DataFusion.
   
   There are likely other places that could be updated to use this (such as XXX)
   
   
   ## What changes are included in this PR?
   
   1. refactored out the [code in the 
`BloomFilterPruningPredicate`](https://github.com/apache/arrow-datafusion/blob/0d7cab055cb39d6df751e070af5a0bf5444e3849/datafusion/core/src/datasource/physical_plan/parquet/row_groups.rs#L182-L197)
 originally written by  @haohuaijin  into its own struct
   2. Made it more general
   3. Added many tests
   
   ## Are these changes tested?
   Yes, this code is mostly new tests
   
   ## Are there any user-facing changes?
   Not yet
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to