I am interested in working on this proposal. I would assume it is to use `InternalData` with the format as `parquet`. But the challenge will be the test cases, the core module cannot write the parquet metadata due to circular dependency. We need to abstract out the test cases in the core module and run them from the parquet module I guess.
I can work on a design doc as well. So, add me as a collaborator for the document. But should this work be done after we complete the work on "single file commit in v4" ? because metadata structure can change? - Ajantha On Thu, May 29, 2025 at 11:37 PM Russell Spitzer <russell.spit...@gmail.com> wrote: > Hi Y'all > > As discussed in the last community sync, we are beginning to gather up > folks who are interested in various efforts for Iceberg V4. To that end, > I'd like to use this thread as a gathering point for folks interested in > the metadata file format shift to Parquet. I wrote a quick abstract to > describe the purpose of this group. > > Following this I'll be working on a full design document or if someone has > one in prod please let us know and we can start discussing/working on > it there. > > *Abstract: Parquet as Metadata File Format* > > Currently the Iceberg SDK and Spec use Avro file format files for all > Manifest Lists and Manifests. The row oriented format was selected > because it was assumed that most metadata would be read in its entirety. > This has turned out to seldom be the case and the ability to read > single elements of the metrics would be very useful for query planning. To > address this we propose switching the underlying manifest format > from Avro to Parquet. In V4, Avro files would still be readable but all > new metadata files would be written in Parquet instead of Avro. >