rdblue commented on a change in pull request #1641:
URL: https://github.com/apache/iceberg/pull/1641#discussion_r517647143



##########
File path: parquet/src/main/java/org/apache/iceberg/parquet/ParquetUtil.java
##########
@@ -65,28 +68,25 @@
   private ParquetUtil() {
   }
 
-  // Access modifier is package-private, to only allow use from existing tests
-  static Metrics fileMetrics(InputFile file) {
-    return fileMetrics(file, MetricsConfig.getDefault());
-  }
-
   public static Metrics fileMetrics(InputFile file, MetricsConfig 
metricsConfig) {
     return fileMetrics(file, metricsConfig, null);
   }
 
   public static Metrics fileMetrics(InputFile file, MetricsConfig 
metricsConfig, NameMapping nameMapping) {
     try (ParquetFileReader reader = 
ParquetFileReader.open(ParquetIO.file(file))) {
-      return footerMetrics(reader.getFooter(), metricsConfig, nameMapping);
+      return footerMetrics(reader.getFooter(), Stream.empty(), null, 
metricsConfig, nameMapping);

Review comment:
       I don't have a reference for the Parquet fix. I think I was in a sync 
where it was discussed. Maybe we should generate our own lower/upper bounds for 
Parquet then.
   
   `Float.compare` will sort NaN values last, so if we do get max=NaN from 
Parquet our evaluators should still work as expected. It will just include a 
much larger range than necessary. If we can generate better stats for table 
metadata, then that would be ideal.
   
   Your cases look correct, except that I would say that we expect only max=NaN 
or min=max=NaN from Parquet. Using the existing logic should be okay.
   
   I agree that the lack of a NaN counter means that the value is unknown. This 
is the case for all files written to v1 tables. V2 tables will generally have 
the NaN counter values, but not in all cases (imported files).




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to