Ok, ran the bisect: *➜ **parquet-java* *git:(**d5f86d7c**) **✗* git bisect bad
d5f86d7c0e9894510e8af6dfd37444843e6d1bc4 is the first bad commit commit d5f86d7c0e9894510e8af6dfd37444843e6d1bc4 Author: Gang Wu <ust...@gmail.com> Date: Tue Jan 21 16:18:19 2025 +0800 GH-3133: Fix SizeStatistics to handle omitted histogram (#3134) .../apache/parquet/column/statistics/SizeStatistics.java | 6 ++++-- .../parquet/column/statistics/TestSizeStatistics.java | 16 ++++++++++++++++ .../format/converter/ParquetMetadataConverter.java | 10 ++++++++-- And this makes sense to me :) I've created a PR against Trino <https://github.com/trinodb/trino/pull/26511>, and got everything passing with some help of Yuya <https://github.com/trinodb/trino/pull/26530>. I see some more tests failing at Iceberg <https://github.com/apache/iceberg/pull/13941>, which I'll dig into before casting my vote. Kind regards, Fokko Op di 2 sep 2025 om 14:30 schreef Fokko Driesprong <fo...@apache.org>: > Hey Rahul, Aihua, > > I was looking into the same thing. > > The PR that you're referring to, was already included since 1.15.0 > <https://github.com/apache/parquet-java/commits/apache-parquet-1.15.0>. > Iceberg currently uses Parquet 1.15.2 > <https://github.com/apache/iceberg/blob/76ff67c658066bd7d05ce4ce54a1d6340ee0a899/gradle/libs.versions.toml#L80>. > I don't see anything obvious in the changelog > <https://github.com/apache/parquet-java/releases/tag/apache-parquet-1.16.0-rc2> > that might have caused the increase in size. Let me do a git bisect to find > out the PR that introduced the change. > > Kind regards, > Fokko > > Op di 2 sep 2025 om 14:11 schreef Rahul Sharma > <rahul.sha...@databricks.com.invalid>: > >> Hi Aihua, >> >> Regarding the Iceberg failure, which parquet-java version is the test >> passing for? I suspect that the failure might be related to >> size-statistics. Could you try running the test with >> `parquet.size.statistics.enabled=false`. This flag was added in this PR >> <https://github.com/apache/parquet-java/pull/3060>. >> >> Thanks, >> Rahul >> >> >> On Tue, Sep 2, 2025 at 3:07 AM Aihua Xu <aihu...@gmail.com> wrote: >> >> > Checked checksum and signature and ran unit tests. >> > >> > I'm also running the tests against Iceberg. Notice one failure >> > < >> > >> https://github.com/apache/iceberg/blob/main/spark/v3.4/spark/src/test/java/org/apache/iceberg/spark/actions/TestRewriteDataFilesAction.java#L308 >> > > >> > that >> > is from Iceberg format version 3 that is writing row lineage. Seems the >> > file size increases after the version upgrade and I haven’t yet >> pinpointed >> > the exact change causing it. But I don't think that is a blocker for >> this >> > release though. >> > >> > org.opentest4j.AssertionFailedError: [Did not have the expected number >> of >> > files] >> > expected: 20 >> > but was: 21 >> > at >> > >> > >> org.apache.iceberg.spark.actions.TestRewriteDataFilesAction.shouldHaveFiles(TestRewriteDataFilesAction.java:2144) >> > at >> > >> > >> org.apache.iceberg.spark.actions.TestRewriteDataFilesAction.testBinPackAfterPartitionChange(TestRewriteDataFilesAction.java:321) >> > >> > >> > On Mon, Sep 1, 2025 at 12:16 AM Gábor Szádovszky <ga...@apache.org> >> wrote: >> > >> > > I've checked tarball content, checksum, and signature. Executed unit >> > tests, >> > > and also some of our internal tests. All passed. >> > > >> > > +1 (binding) >> > > >> > > Gang Wu <ust...@gmail.com> ezt írta (időpont: 2025. aug. 30., Szo, >> > 8:47): >> > > >> > > > Hi everyone, >> > > > >> > > > I propose the following RC to be released as the official Apache >> > Parquet >> > > > Java 1.16.0 release. >> > > > >> > > > The commit id is 402c3810c372d29603e181771acebfecc71bef61 >> > > > * This corresponds to the tag: apache-parquet-1.16.0-rc2 >> > > > * >> > > > >> > > > >> > > >> > >> https://github.com/apache/parquet-java/tree/402c3810c372d29603e181771acebfecc71bef61 >> > > > >> > > > The release tarball, signature, and checksums are here: >> > > > * >> > > >> https://dist.apache.org/repos/dist/dev/parquet/apache-parquet-1.16.0-rc2 >> > > > >> > > > You can find the KEYS file here: >> > > > * https://downloads.apache.org/parquet/KEYS >> > > > >> > > > You can find the changelog here: >> > > > * >> > > > >> > > > >> > > >> > >> https://github.com/apache/parquet-java/releases/tag/apache-parquet-1.16.0-rc2 >> > > > >> > > > Binary artifacts are staged in Nexus here: >> > > > * >> > > >> https://repository.apache.org/content/groups/staging/org/apache/parquet/ >> > > > >> > > > Please download, verify, and test. >> > > > >> > > > Please vote in the next 72 hours. >> > > > >> > > > [ ] +1 Release this as Apache Parquet Java 1.16.0 >> > > > [ ] +0 >> > > > [ ] -1 Do not release this because... >> > > > >> > > > Thanks, >> > > > Gang >> > > > >> > > >> > >> >