rdblue commented on a change in pull request #1638:
URL: https://github.com/apache/iceberg/pull/1638#discussion_r509740053



##########
File path: 
parquet/src/main/java/org/apache/iceberg/parquet/ParquetMetricsRowGroupFilter.java
##########
@@ -423,4 +451,24 @@ public Boolean or(Boolean leftResult, Boolean rightResult) 
{
       return (T) conversions.get(id).apply(statistics.genericGetMax());
     }
   }
+
+  /**
+   * Checks against older versions of Parquet statistics which may have a null 
count but undefined min and max
+   * statistics. Returns true if nonNull values exist in the row group but no 
further statistics are available.
+   * <p>
+   * We can't use {@code  statistics.hasNonNullValue()} because it is 
inaccurate with older files and will return
+   * false if min and max are not set.
+   * <p>
+   * This is specifically for 1.5.0-CDH Parquet builds and later which contain 
the different unusual hasNonNull
+   * behavior. OSS Parquet builds are not effected because PARQUET-251 
prohibits the reading of these statistics

Review comment:
       Why is this bug limited to binary and strings?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to