Re: [DISCUSS] V4 - Parquet as Metadata File Format

Russell Spitzer Tue, 19 Aug 2025 12:28:21 -0700

Since the POC is basically done I'd like to share a brief proposal on the
switch.


https://docs.google.com/document/d/13TeFu20jhkUFbb-FaaGQWFiXKO6kKyy_nX1QyFgoUoU/edit?usp=sharing

The goal and suggestion of the proposal are relatively short, the actual
problem comes with some of the existing
design decisions that we've made in the implementation that we'll need to
be making some bigger decisions about.
I've listed them at the bottom of the above doc, so please peruse them.

Attached here is a POC demonstrating how this switch can be done
https://github.com/apache/iceberg/pull/13769

For those who don't want to get into the details of the doc, our main
blockers on this switch in the implementation are

1. Empty Partition Specs - While it's possible that we may drop storing
partition tuples in V4, they are currently required.
This is a problem as Parquet does not support an empty struct. This is not
just that it has to be optional, the schema can
literally not hold an empty struct.

2. Snapshot provides some methods like addedFiles(FileIO), these methods
downstream assume that you can read
partitionSpecs directly out of the file metadata. Writing this into the
file metadata is required by spec but reading the
parquet footer is not allowed in InteralReader. This means the public APIs
above cannot work with Internal Readers and
Parquet Manifests without some complicated workarounds.

Please take a look, I think we can break the changes in the POC up into
some independent code changes that can
be discussed and reviewed.

I plan on trying to write some benchmarks for parquet manifests as well
this week but I wanted to make sure folks
knew what I had finished so far.

On Tue, Aug 19, 2025 at 1:42 PM Anoop Johnson <an...@apache.org> wrote:

> I'm excited about the proposal to switch to Parquet as the manifest format
> for v4 of Iceberg. This change, which would include supporting Avro
> manifests from v1-v3 for table upgrades, looks like a great move.
>
> It aligns perfectly with the v4 column statistics proposal we discussed at
> today's community sync. Using Parquet also simplifies the v4 implementation
> and should lead to performance gains and a smaller metadata storage
> footprint.
>
> Thanks, Russell, for leading this proposal and building the prototype!
>
> Best,
> Anoop
>
> On Wed, Aug 6, 2025 at 12:51 AM Sreeram Garlapati <gsreeramku...@gmail.com>
> wrote:
>
>> +1
>> This will be a great progression for iceberg format allowing efficient
>> metadata pruning. pl. count me in.
>>
>> On Tue, Jun 17, 2025 at 3:45 AM Jacky Lee <qcsd2...@gmail.com> wrote:
>>
>>> Count me in. This solution effectively addresses the small files issue
>>> caused by high-frequency writes in our scenario, and it also greatly
>>> benefits the generation of partition- and table-level statistics.
>>>
>>> <mlhsmode...@gmail.com> 于2025年6月14日周六 07:04写道：
>>> >
>>> > I'm interested in working on this change as well. I think it pairs
>>> nicely with the proposal for per column structs for statistics.
>>> >
>>> > Thanks,
>>> > Harman
>>> >
>>> > On Thu, Jun 12, 2025 at 9:43 PM Russell Spitzer <
>>> russell.spit...@gmail.com> wrote:
>>> >>
>>> >> It’s not required at compile time, only at test runtime.
>>> >>
>>> >> On Thu, Jun 12, 2025 at 8:37 PM Ajantha Bhat <ajanthab...@gmail.com>
>>> wrote:
>>> >>>
>>> >>> > All we have to do is add the parquet module as a test dependency,
>>> working on a poc now.
>>> >>>
>>> >>> This will be a circular dependency on the core module. That's why I
>>> suggested abstracting out the test cases and executing them in a parquet
>>> module. Partition stats writing (as parquet) from the core module uses
>>> `InternalData` and does the same now. So, I guess it will be a similar work
>>> (but on a larger scale due to testcase refactoring).
>>> >>>
>>> >>> Let me know the results of your POC and happy to collaborate on this
>>> work.
>>> >>>
>>> >>>
>>> >>> - Ajantha
>>> >>>
>>> >>> On Fri, Jun 13, 2025 at 3:16 AM Russell Spitzer <
>>> russell.spit...@gmail.com> wrote:
>>> >>>>
>>> >>>> All we have to do is add the parquet module as a test dependency,
>>> working on a poc now. I don't think we really need to block on any other
>>> projects although I'll probably hold off on any work on manifest-list since
>>> I hope it won't be needed.
>>> >>>>
>>> >>>> On Thu, May 29, 2025 at 8:37 PM Ajantha Bhat <ajanthab...@gmail.com>
>>> wrote:
>>> >>>>>
>>> >>>>> I am interested in working on this proposal.
>>> >>>>> I would assume it is to use `InternalData` with the format as
>>> `parquet`. But the challenge will be the test cases, the core module cannot
>>> write the parquet metadata due to circular dependency. We need to abstract
>>> out the test cases in the core module and run them from the parquet module
>>> I guess.
>>> >>>>>
>>> >>>>> I can work on a design doc as well. So, add me as a collaborator
>>> for the document.
>>> >>>>> But should this work be done after we complete the work on "single
>>> file commit in v4" ? because metadata structure can change?
>>> >>>>>
>>> >>>>> - Ajantha
>>> >>>>>
>>> >>>>> On Thu, May 29, 2025 at 11:37 PM Russell Spitzer <
>>> russell.spit...@gmail.com> wrote:
>>> >>>>>>
>>> >>>>>> Hi Y'all
>>> >>>>>>
>>> >>>>>> As discussed in the last community sync, we are beginning to
>>> gather up folks who are interested in various efforts for Iceberg V4. To
>>> that end,
>>> >>>>>> I'd like to use this thread as a gathering point for folks
>>> interested in the metadata file format shift to Parquet. I wrote a quick
>>> abstract to
>>> >>>>>> describe the purpose of this group.
>>> >>>>>>
>>> >>>>>> Following this I'll be working on a full design document or if
>>> someone has one in prod please let us know and we can start
>>> discussing/working on
>>> >>>>>> it there.
>>> >>>>>>
>>> >>>>>> Abstract: Parquet as Metadata File Format
>>> >>>>>>
>>> >>>>>> Currently the Iceberg SDK and Spec use Avro file format files for
>>> all Manifest Lists and Manifests. The row oriented format was selected
>>> >>>>>> because it was assumed that most metadata would be read in its
>>> entirety. This has turned out to seldom be the case and the ability to read
>>> >>>>>> single elements of the metrics would be very useful for query
>>> planning. To address this we propose switching the underlying manifest
>>> format
>>> >>>>>> from Avro to Parquet. In V4, Avro files would still be readable
>>> but all new metadata files would be written in Parquet instead of Avro.
>>>
>>

Re: [DISCUSS] V4 - Parquet as Metadata File Format

Reply via email to