alamb opened a new pull request, #8437: URL: https://github.com/apache/arrow-datafusion/pull/8437
## Which issue does this PR close? Part of https://github.com/apache/arrow-datafusion/issues/8376 (see the POC PR https://github.com/apache/arrow-datafusion/pull/8397 for how this all fits together) Broken out for easier PR review. I will update the bloom filter code to actually use this as a follow on PR ## Rationale for this change I am generalizing pruning a set of files/row_groups/containers based on information known before looking at the data (aka "pruning predicates") to support bloom filters and other similar structures which can tell if a given value is present/present in a given container (see https://github.com/apache/arrow-datafusion/issues/8376). Part of this analysis requires identifying if a given predicate ensures that a given column must take a a literal value (aka a constant) or set of values. Given this is an important building block and non trivial analysis, I wanted to make it a first class concept in DataFusion. There are likely other places that could be updated to use this (such as XXX) ## What changes are included in this PR? 1. refactored out the [code in the `BloomFilterPruningPredicate`](https://github.com/apache/arrow-datafusion/blob/0d7cab055cb39d6df751e070af5a0bf5444e3849/datafusion/core/src/datasource/physical_plan/parquet/row_groups.rs#L182-L197) originally written by @haohuaijin into its own struct 2. Made it more general 3. Added many tests ## Are these changes tested? Yes, this code is mostly new tests ## Are there any user-facing changes? Not yet -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
