RussellSpitzer commented on code in PR #6517:
URL: https://github.com/apache/iceberg/pull/6517#discussion_r1061738195
##########
parquet/src/main/java/org/apache/iceberg/parquet/ParquetMetricsRowGroupFilter.java:
##########
@@ -580,6 +608,10 @@ static boolean hasNonNullButNoMinMax(Statistics
statistics, long valueCount) {
&& (statistics.getMaxBytes() == null || statistics.getMinBytes() ==
null);
}
+ static boolean minMaxUndefined(Statistics statistics) {
+ return !statistics.isEmpty() && !statistics.hasNonNullValue();
Review Comment:
I simplified this a bit further since I don't think we have to handle the
CDH special case in the same way. Now we basically just have a two step check
1) Are all the values null? This is only true if nullCount is set and it is
equal to the value count from the chunk stats.
2) We check if min/max are defined for the stats
a. For normal parquet stats this means that the stats are not empty &&
hasNonNull is false
b. For the old CDH version this means the stats are not empty but min and
max are set to null. I think the "a" case may also cover this based on my old
notes but I figured since this check will be short circuited it doesn't hurt to
leave it in.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]