Hi, Peter: Thanks for raising this, and this proposal sounds quite interesting to me.
I've reviewed the doc but it still seems too abstract to understand, do you mind to submit a pr so that it would be more clear what's changed? On Wed, Feb 12, 2025 at 12:46 AM Péter Váry <peter.vary.apa...@gmail.com> wrote: > Hi Team, > > As mentioned earlier on our Community Sync I am exploring the > possibility to define a FileFormat API for accessing different file > formats. I have put together a proposal based on my findings. > > ------------------- > Iceberg currently supports 3 different file formats: Avro, Parquet, ORC. > With the introduction of Iceberg V3 specification many new features are > added to Iceberg. Some of these features like new column types, default > values require changes at the file format level. The changes are added by > individual developers with different focus on the different file formats. > As a result not all of the features are available for every supported file > format. > Also there are emerging file formats like Vortex [1] or Lance [2] which > either by specialization, or by applying newer research results could > provide better alternatives for certain use-cases like random access for > data, or storing ML models. > ------------------- > > Please check the detailed proposal [3] and the google document [4], and > comment there or reply on the dev list if you have any suggestions. > > Thanks, > Peter > > [1] - https://github.com/spiraldb/vortex > [2] - https://lancedb.github.io/lance/ > [3] - https://github.com/apache/iceberg/issues/12225 > [4] - > https://docs.google.com/document/d/1sF_d4tFxJsZWsZFCyCL9ZE7YuI7-P3VrzMLIrrTIxds > >