[GitHub] [iceberg] yyanyy commented on a change in pull request #2069: API: handle NaN as min/max stats in evaluators

GitBox Mon, 11 Jan 2021 18:41:48 -0800


yyanyy commented on a change in pull request #2069:
URL: https://github.com/apache/iceberg/pull/2069#discussion_r555474356




##########
File path: 
api/src/main/java/org/apache/iceberg/expressions/InclusiveMetricsEvaluator.java
##########
@@ -204,15 +210,20 @@ public Boolean or(Boolean leftResult, Boolean 
rightResult) {
     public <T> Boolean ltEq(BoundReference<T> ref, Literal<T> lit) {
       Integer id = ref.fieldId();
 
-      if (containsNullsOnly(id)) {
+      if (containsNullsOnly(id) || containsNaNsOnly(id)) {
         return ROWS_CANNOT_MATCH;
       }
 
       if (lowerBounds != null && lowerBounds.containsKey(id)) {
         T lower = Conversions.fromByteBuffer(ref.type(), lowerBounds.get(id));
 
         int cmp = lit.comparator().compare(lower, lit.value());
-        if (cmp > 0) {
+
+        // Due to the comparison implementation of ORC stats, for float/double 
columns in ORC files,

Review comment:
       Thanks for the quick review! Yeah I do realize that repeating the same 
comment over and over again is a bit annoying, but I wasn't sure where the 
right balance is. Since I'm hoping to check for `isNaN` after comparing to 
avoid unnecessary checking in `lt`/`ltEq`, the only thing I can abstract out is 
`NaNUtil.isNaN(lower)`, that we are essentially wrapping around a wrapper; and 
also I guess that might not help much with readability since the actual 
explanation in this case will be outside of the logic flow here, so the reader 
will have to jump around to understand the full intention. Maybe we can shorten 
this comment everywhere and have the full version at the start of the class? Do 
you/other people have any suggestion? 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] yyanyy commented on a change in pull request #2069: API: handle NaN as min/max stats in evaluators

Reply via email to