singhpk234 commented on code in PR #7391:
URL: https://github.com/apache/iceberg/pull/7391#discussion_r1173527238
##########
spark/v3.4/spark/src/test/java/org/apache/iceberg/spark/source/TestMetadataTableReadableMetrics.java:
##########
@@ -190,6 +191,7 @@ private GenericRecord createNestedRecord(Long longCol,
Double doubleCol) {
}
@Test
+ @Ignore
Review Comment:
This is actually interesting and happening only in java 17 env :
basically the parquet file written has diff `column_size` value in `files`
metadata table, in java 8 it's 44 and in java 17 it's 43.
effectively this line here produces diff results:
https://github.com/apache/iceberg/blob/d04efee702fcdcdbe3659c12f7442f5000aa246a/parquet/src/main/java/org/apache/iceberg/parquet/ParquetUtil.java#L127
Is it because java 17 provides better compression than java 8 ? as per this
blog : https://dkomanov.medium.com/java-compression-performance-fb373078cfde
since this is a value we are getting directly from parquet footer, so
effectively we are reading what were writing stats are not getting messed up in
between, which seems correct to me. But I might be wrong here, will wait for
other folks feedback here if there is deeper investigation required.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]