Hi Team, As mentioned earlier on our Community Sync I am exploring the possibility to define a FileFormat API for accessing different file formats. I have put together a proposal based on my findings.
------------------- Iceberg currently supports 3 different file formats: Avro, Parquet, ORC. With the introduction of Iceberg V3 specification many new features are added to Iceberg. Some of these features like new column types, default values require changes at the file format level. The changes are added by individual developers with different focus on the different file formats. As a result not all of the features are available for every supported file format. Also there are emerging file formats like Vortex [1] or Lance [2] which either by specialization, or by applying newer research results could provide better alternatives for certain use-cases like random access for data, or storing ML models. ------------------- Please check the detailed proposal [3] and the google document [4], and comment there or reply on the dev list if you have any suggestions. Thanks, Peter [1] - https://github.com/spiraldb/vortex [2] - https://lancedb.github.io/lance/ [3] - https://github.com/apache/iceberg/issues/12225 [4] - https://docs.google.com/document/d/1sF_d4tFxJsZWsZFCyCL9ZE7YuI7-P3VrzMLIrrTIxds