joyhaldar commented on code in PR #14593:
URL: https://github.com/apache/iceberg/pull/14593#discussion_r2544114209
##########
api/src/main/java/org/apache/iceberg/expressions/InclusiveMetricsEvaluator.java:
##########
@@ -327,6 +327,21 @@ public <T> Boolean eq(Bound<T> term, Literal<T> lit) {
public <T> Boolean notEq(Bound<T> term, Literal<T> lit) {
// because the bounds are not necessarily a min or max value, this
cannot be answered using
// them. notEq(col, X) with (X, Y) doesn't guarantee that X is a value
in col.
+ // However, when min == max and the file has no nulls, we can safely
prune
+ // if that value equals the literal.
+ int id = term.ref().fieldId();
+ if (mayContainNull(id)) {
Review Comment:
Thank you again Nandor. You're right that it says that NaNs are not
permitted as lower or upper bounds.
However, there's actually a note in the [class
javadoc](https://github.com/apache/iceberg/blob/6e873baa7db22429fa231d552a696248f64ea1f7/api/src/main/java/org/apache/iceberg/expressions/InclusiveMetricsEvaluator.java#L40)
that explains why we still may need to check,
`
> Due to the comparison implementation of ORC stats, for float/double
columns in ORC files, if the first value in a file is NaN, metrics of this file
will report NaN for both upper and lower bound despite that the column could
contain non-NaN data. Thus in some scenarios explicitly checks for NaN is
necessary in order to not skip files that may contain matching data.
`
So the spec ideally prohibits NaN bounds, but ORC files in the wild can
still have them based on the javadoc. I can also see that existing methods like
`lt()`, `ltEq()`, `eq()` all include the `NaNUtil.isNaN(bounds)` check.
Let me know if you see it differently and we should consider removing this
check.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]