Ok, ran the bisect:

*➜  **parquet-java* *git:(**d5f86d7c**) **✗* git bisect bad


d5f86d7c0e9894510e8af6dfd37444843e6d1bc4 is the first bad commit

commit d5f86d7c0e9894510e8af6dfd37444843e6d1bc4

Author: Gang Wu <ust...@gmail.com>

Date:   Tue Jan 21 16:18:19 2025 +0800


    GH-3133: Fix SizeStatistics to handle omitted histogram (#3134)


 .../apache/parquet/column/statistics/SizeStatistics.java |  6 ++++--

 .../parquet/column/statistics/TestSizeStatistics.java    | 16
++++++++++++++++

 .../format/converter/ParquetMetadataConverter.java       | 10 ++++++++--


And this makes sense to me :) I've created a PR against Trino
<https://github.com/trinodb/trino/pull/26511>, and got everything passing with
some help of Yuya <https://github.com/trinodb/trino/pull/26530>. I see
some more
tests failing at Iceberg <https://github.com/apache/iceberg/pull/13941>,
which I'll dig into before casting my vote.

Kind regards,
Fokko


Op di 2 sep 2025 om 14:30 schreef Fokko Driesprong <fo...@apache.org>:

> Hey Rahul, Aihua,
>
> I was looking into the same thing.
>
> The PR that you're referring to, was already included since 1.15.0
> <https://github.com/apache/parquet-java/commits/apache-parquet-1.15.0>.
> Iceberg currently uses Parquet 1.15.2
> <https://github.com/apache/iceberg/blob/76ff67c658066bd7d05ce4ce54a1d6340ee0a899/gradle/libs.versions.toml#L80>.
> I don't see anything obvious in the changelog
> <https://github.com/apache/parquet-java/releases/tag/apache-parquet-1.16.0-rc2>
> that might have caused the increase in size. Let me do a git bisect to find
> out the PR that introduced the change.
>
> Kind regards,
> Fokko
>
> Op di 2 sep 2025 om 14:11 schreef Rahul Sharma
> <rahul.sha...@databricks.com.invalid>:
>
>> Hi Aihua,
>>
>> Regarding the Iceberg failure, which parquet-java version is the test
>> passing for? I suspect that the failure might be related to
>> size-statistics. Could you try running the test with
>> `parquet.size.statistics.enabled=false`. This flag was added in this PR
>> <https://github.com/apache/parquet-java/pull/3060>.
>>
>> Thanks,
>> Rahul
>>
>>
>> On Tue, Sep 2, 2025 at 3:07 AM Aihua Xu <aihu...@gmail.com> wrote:
>>
>> > Checked checksum and signature and ran unit tests.
>> >
>> > I'm also running the tests against Iceberg. Notice one failure
>> > <
>> >
>> https://github.com/apache/iceberg/blob/main/spark/v3.4/spark/src/test/java/org/apache/iceberg/spark/actions/TestRewriteDataFilesAction.java#L308
>> > >
>> > that
>> > is from Iceberg format version 3 that is writing row lineage. Seems the
>> > file size increases after the version upgrade and I haven’t yet
>> pinpointed
>> > the exact change causing it. But I don't think that is a blocker for
>> this
>> > release though.
>> >
>> > org.opentest4j.AssertionFailedError: [Did not have the expected number
>> of
>> > files]
>> > expected: 20
>> >  but was: 21
>> > at
>> >
>> >
>> org.apache.iceberg.spark.actions.TestRewriteDataFilesAction.shouldHaveFiles(TestRewriteDataFilesAction.java:2144)
>> > at
>> >
>> >
>> org.apache.iceberg.spark.actions.TestRewriteDataFilesAction.testBinPackAfterPartitionChange(TestRewriteDataFilesAction.java:321)
>> >
>> >
>> > On Mon, Sep 1, 2025 at 12:16 AM Gábor Szádovszky <ga...@apache.org>
>> wrote:
>> >
>> > > I've checked tarball content, checksum, and signature. Executed unit
>> > tests,
>> > > and also some of our internal tests. All passed.
>> > >
>> > > +1 (binding)
>> > >
>> > > Gang Wu <ust...@gmail.com> ezt írta (időpont: 2025. aug. 30., Szo,
>> > 8:47):
>> > >
>> > > > Hi everyone,
>> > > >
>> > > > I propose the following RC to be released as the official Apache
>> > Parquet
>> > > > Java 1.16.0 release.
>> > > >
>> > > > The commit id is 402c3810c372d29603e181771acebfecc71bef61
>> > > > * This corresponds to the tag: apache-parquet-1.16.0-rc2
>> > > > *
>> > > >
>> > > >
>> > >
>> >
>> https://github.com/apache/parquet-java/tree/402c3810c372d29603e181771acebfecc71bef61
>> > > >
>> > > > The release tarball, signature, and checksums are here:
>> > > > *
>> > >
>> https://dist.apache.org/repos/dist/dev/parquet/apache-parquet-1.16.0-rc2
>> > > >
>> > > > You can find the KEYS file here:
>> > > > * https://downloads.apache.org/parquet/KEYS
>> > > >
>> > > > You can find the changelog here:
>> > > > *
>> > > >
>> > > >
>> > >
>> >
>> https://github.com/apache/parquet-java/releases/tag/apache-parquet-1.16.0-rc2
>> > > >
>> > > > Binary artifacts are staged in Nexus here:
>> > > > *
>> > >
>> https://repository.apache.org/content/groups/staging/org/apache/parquet/
>> > > >
>> > > > Please download, verify, and test.
>> > > >
>> > > > Please vote in the next 72 hours.
>> > > >
>> > > > [ ] +1 Release this as Apache Parquet Java 1.16.0
>> > > > [ ] +0
>> > > > [ ] -1 Do not release this because...
>> > > >
>> > > > Thanks,
>> > > > Gang
>> > > >
>> > >
>> >
>>
>

Reply via email to