Dandandan opened a new pull request, #22084: URL: https://github.com/apache/datafusion/pull/22084
## Which issue does this PR close? - Closes #. ## Rationale for this change Parquet statistics pruning did not rewrite `IS DISTINCT FROM` or `IS NOT DISTINCT FROM`, so row groups that could be proven irrelevant from min/max and null-count statistics were still kept. ## What changes are included in this PR? - Adds null-aware pruning rewrites for `IS DISTINCT FROM` and `IS NOT DISTINCT FROM`. - Treats distinct-from operators as symmetric when normalizing scalar-left predicates. - Refactors shared min/max and null-count pruning expression builders. - Adds unit tests for pruning predicate evaluation and Parquet row-group regression coverage. ## Are these changes tested? - `cargo fmt --all` - `cargo test -p datafusion-pruning prune_int32_col_is_distinct_from` - `cargo test -p datafusion-pruning prune_int32_col_is_not_distinct_from` - `cargo test -p datafusion --test parquet_integration prune_is_not_distinct_from_i32 -- --nocapture` - `./dev/rust_lint.sh` ## Are there any user-facing changes? No API changes. Queries using `IS DISTINCT FROM` and `IS NOT DISTINCT FROM` can now benefit from Parquet statistics pruning. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
