Dandandan opened a new pull request, #22084:
URL: https://github.com/apache/datafusion/pull/22084

   
   ## Which issue does this PR close?
   
   - Closes #.
   
   ## Rationale for this change
   
   Parquet statistics pruning did not rewrite `IS DISTINCT FROM` or `IS NOT 
DISTINCT FROM`, so row groups that could be proven irrelevant from min/max and 
null-count statistics were still kept.
   
   ## What changes are included in this PR?
   
   - Adds null-aware pruning rewrites for `IS DISTINCT FROM` and `IS NOT 
DISTINCT FROM`.
   - Treats distinct-from operators as symmetric when normalizing scalar-left 
predicates.
   - Refactors shared min/max and null-count pruning expression builders.
   - Adds unit tests for pruning predicate evaluation and Parquet row-group 
regression coverage.
   
   ## Are these changes tested?
   
   - `cargo fmt --all`
   - `cargo test -p datafusion-pruning prune_int32_col_is_distinct_from`
   - `cargo test -p datafusion-pruning prune_int32_col_is_not_distinct_from`
   - `cargo test -p datafusion --test parquet_integration 
prune_is_not_distinct_from_i32 -- --nocapture`
   - `./dev/rust_lint.sh`
   
   ## Are there any user-facing changes?
   
   No API changes. Queries using `IS DISTINCT FROM` and `IS NOT DISTINCT FROM` 
can now benefit from Parquet statistics pruning.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to