joyhaldar opened a new pull request, #14593: URL: https://github.com/apache/iceberg/pull/14593
## Summary This PR adds file pruning optimization for `NOT IN` and `!=` predicates when a file contains a single distinct value (i.e., when `min == max`). ## Problem Currently, [InclusiveMetricsEvaluator](https://github.com/apache/iceberg/blob/main/api/src/main/java/org/apache/iceberg/expressions/InclusiveMetricsEvaluator.java) cannot prune files for `NOT IN` and `!=` predicates, even when the file provably contains no matching rows. ## Solution When `min == max` and the file has no nulls, we can safely prune if: - For `NOT IN`: the single value is in the exclusion list - For `!=`: the single value equals the literal ## Testing - Added unit tests for both `notIn` and `notEq` optimizations - Verified correct behavior with nulls (must scan) and without nulls (can prune) Fixes #14592 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
