rdblue commented on a change in pull request #2069:
URL: https://github.com/apache/iceberg/pull/2069#discussion_r559044025
##########
File path:
api/src/main/java/org/apache/iceberg/expressions/InclusiveMetricsEvaluator.java
##########
@@ -204,15 +210,20 @@ public Boolean or(Boolean leftResult, Boolean
rightResult) {
public <T> Boolean ltEq(BoundReference<T> ref, Literal<T> lit) {
Integer id = ref.fieldId();
- if (containsNullsOnly(id)) {
+ if (containsNullsOnly(id) || containsNaNsOnly(id)) {
return ROWS_CANNOT_MATCH;
}
if (lowerBounds != null && lowerBounds.containsKey(id)) {
T lower = Conversions.fromByteBuffer(ref.type(), lowerBounds.get(id));
int cmp = lit.comparator().compare(lower, lit.value());
- if (cmp > 0) {
+
+ // Due to the comparison implementation of ORC stats, for float/double
columns in ORC files,
Review comment:
I don't that there is a need for an extra method that has just one
method call. I'd probably do it like this:
```java
T lower = Conversions.fromByteBuffer(ref.type(),
lowerBounds.get(id));
if (NaNUtil.isNaN(lower)) {
// NaN indicates unreliable bounds. See the
InclusiveMetricsEvaluator docs for more.
return ROWS_MIGHT_MATCH;
}
int cmp = lit.comparator().compare(lower, lit.value());
if (cmp > 0) {
return ROWS_CANNOT_MATCH;
}
```
The docs would go in the javadoc for the whole class, and each NaN check
could simply refer back to it. I also moved the NaN check above the comparison
to keep the logic simple: if the value is NaN, the bound is invalid.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]