Hi Renjie, Here is the WIP PR for the readers: https://github.com/apache/iceberg/pull/12069 Here is the WIP PR for the writers: https://github.com/apache/iceberg/pull/12164
If you want to concentrate on the proposed new API, maybe this is the best place to start: https://github.com/apache/iceberg/compare/main...pvary:iceberg:file_format_api_minimal_few_class Thanks, Peter Renjie Liu <liurenjie2...@gmail.com> ezt írta (időpont: 2025. febr. 14., P, 11:15): > Hi, Peter: > > Thanks for raising this, and this proposal sounds quite interesting to me. > > I've reviewed the doc but it still seems too abstract to understand, do > you mind to submit a pr so that it would be more clear what's changed? > > On Wed, Feb 12, 2025 at 12:46 AM Péter Váry <peter.vary.apa...@gmail.com> > wrote: > >> Hi Team, >> >> As mentioned earlier on our Community Sync I am exploring the >> possibility to define a FileFormat API for accessing different file >> formats. I have put together a proposal based on my findings. >> >> ------------------- >> Iceberg currently supports 3 different file formats: Avro, Parquet, ORC. >> With the introduction of Iceberg V3 specification many new features are >> added to Iceberg. Some of these features like new column types, default >> values require changes at the file format level. The changes are added by >> individual developers with different focus on the different file formats. >> As a result not all of the features are available for every supported file >> format. >> Also there are emerging file formats like Vortex [1] or Lance [2] which >> either by specialization, or by applying newer research results could >> provide better alternatives for certain use-cases like random access for >> data, or storing ML models. >> ------------------- >> >> Please check the detailed proposal [3] and the google document [4], and >> comment there or reply on the dev list if you have any suggestions. >> >> Thanks, >> Peter >> >> [1] - https://github.com/spiraldb/vortex >> [2] - https://lancedb.github.io/lance/ >> [3] - https://github.com/apache/iceberg/issues/12225 >> [4] - >> https://docs.google.com/document/d/1sF_d4tFxJsZWsZFCyCL9ZE7YuI7-P3VrzMLIrrTIxds >> >>