rdblue commented on a change in pull request #1641:
URL: https://github.com/apache/iceberg/pull/1641#discussion_r517647143
##########
File path: parquet/src/main/java/org/apache/iceberg/parquet/ParquetUtil.java
##########
@@ -65,28 +68,25 @@
private ParquetUtil() {
}
- // Access modifier is package-private, to only allow use from existing tests
- static Metrics fileMetrics(InputFile file) {
- return fileMetrics(file, MetricsConfig.getDefault());
- }
-
public static Metrics fileMetrics(InputFile file, MetricsConfig
metricsConfig) {
return fileMetrics(file, metricsConfig, null);
}
public static Metrics fileMetrics(InputFile file, MetricsConfig
metricsConfig, NameMapping nameMapping) {
try (ParquetFileReader reader =
ParquetFileReader.open(ParquetIO.file(file))) {
- return footerMetrics(reader.getFooter(), metricsConfig, nameMapping);
+ return footerMetrics(reader.getFooter(), Stream.empty(), null,
metricsConfig, nameMapping);
Review comment:
I don't have a reference for the Parquet fix. I think I was in a sync
where it was discussed. Maybe we should generate our own lower/upper bounds for
Parquet then.
`Float.compare` will sort NaN values last, so if we do get max=NaN from
Parquet our evaluators should still work as expected. It will just include a
much larger range than necessary. If we can generate better stats for table
metadata, then that would be ideal.
Your cases look correct, except that I would say that we expect only max=NaN
or min=max=NaN from Parquet. Using the existing logic should be okay.
I agree that the lack of a NaN counter means that the value is unknown. This
is the case for all files written to v1 tables. V2 tables will generally have
the NaN counter values, but not in all cases (imported files).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]