yyanyy commented on a change in pull request #2069:
URL: https://github.com/apache/iceberg/pull/2069#discussion_r555474356
##########
File path:
api/src/main/java/org/apache/iceberg/expressions/InclusiveMetricsEvaluator.java
##########
@@ -204,15 +210,20 @@ public Boolean or(Boolean leftResult, Boolean
rightResult) {
public <T> Boolean ltEq(BoundReference<T> ref, Literal<T> lit) {
Integer id = ref.fieldId();
- if (containsNullsOnly(id)) {
+ if (containsNullsOnly(id) || containsNaNsOnly(id)) {
return ROWS_CANNOT_MATCH;
}
if (lowerBounds != null && lowerBounds.containsKey(id)) {
T lower = Conversions.fromByteBuffer(ref.type(), lowerBounds.get(id));
int cmp = lit.comparator().compare(lower, lit.value());
- if (cmp > 0) {
+
+ // Due to the comparison implementation of ORC stats, for float/double
columns in ORC files,
Review comment:
Thanks for the quick review! Yeah I do realize that repeating the same
comment over and over again is a bit annoying, but I wasn't sure where the
right balance is. Since I'm hoping to check for `isNaN` after comparing to
avoid unnecessary checking in `lt`/`ltEq`, the only thing I can abstract out is
`NaNUtil.isNaN(lower)`, that we are essentially wrapping around a wrapper; and
also I guess that might not help much with readability since the actual
explanation in this case will be outside of the logic flow here, so the reader
will have to jump around to understand the full intention. Maybe we can shorten
this comment everywhere and have the full version at the start of the class? Do
you/other people have any suggestion?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]