Re: [PR] Optimize NOT IN and != predicates for single-value files [iceberg]

via GitHub Wed, 19 Nov 2025 12:12:37 -0800


nandorKollar commented on code in PR #14593:
URL: https://github.com/apache/iceberg/pull/14593#discussion_r2543419699



##########
api/src/main/java/org/apache/iceberg/expressions/InclusiveMetricsEvaluator.java:
##########
@@ -327,6 +327,21 @@ public <T> Boolean eq(Bound<T> term, Literal<T> lit) {
     public <T> Boolean notEq(Bound<T> term, Literal<T> lit) {
       // because the bounds are not necessarily a min or max value, this 
cannot be answered using
       // them. notEq(col, X) with (X, Y) doesn't guarantee that X is a value 
in col.
+      // However, when min == max and the file has no nulls, we can safely 
prune
+      // if that value equals the literal.
+      int id = term.ref().fieldId();
+      if (mayContainNull(id)) {

Review Comment:
   Thanks @joyhaldar , I'll review it soon. Meanwhile, I found out, that 
probably we shouldn't worry too much about NaN's in lower and upper bound, the 
spec states that:
   `For float and double, the value -0.0 must precede +0.0, as in the IEEE 754 
totalOrder predicate. NaNs are not permitted as lower or upper bounds.`
   Though looking at the tests, it seems that `TestInclusiveMetricsEvaluator` 
still tests against NaN in lower/upper bounds, maybe V1 spec still permitted 
this case?
   
   cc. @pvary what do you think about this improvement, does it look promising 
to you?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Optimize NOT IN and != predicates for single-value files [iceberg]

Reply via email to