Hi Renjie,
Here is the WIP PR for the readers:
https://github.com/apache/iceberg/pull/12069
Here is the WIP PR for the writers:
https://github.com/apache/iceberg/pull/12164

If you want to concentrate on the proposed new API, maybe this is the best
place to start:
https://github.com/apache/iceberg/compare/main...pvary:iceberg:file_format_api_minimal_few_class
Thanks,
Peter

Renjie Liu <liurenjie2...@gmail.com> ezt írta (időpont: 2025. febr. 14., P,
11:15):

> Hi, Peter:
>
> Thanks for raising this, and this proposal sounds quite interesting to me.
>
> I've reviewed the doc but it still seems too abstract to understand, do
> you mind to submit a pr so that it would be more clear what's changed?
>
> On Wed, Feb 12, 2025 at 12:46 AM Péter Váry <peter.vary.apa...@gmail.com>
> wrote:
>
>> Hi Team,
>>
>> As mentioned earlier on our Community Sync I am exploring the
>> possibility to define a FileFormat API for accessing different file
>> formats. I have put together a proposal based on my findings.
>>
>> -------------------
>> Iceberg currently supports 3 different file formats: Avro, Parquet, ORC.
>> With the introduction of Iceberg V3 specification many new features are
>> added to Iceberg. Some of these features like new column types, default
>> values require changes at the file format level. The changes are added by
>> individual developers with different focus on the different file formats.
>> As a result not all of the features are available for every supported file
>> format.
>> Also there are emerging file formats like Vortex [1] or Lance [2] which
>> either by specialization, or by applying newer research results could
>> provide better alternatives for certain use-cases like random access for
>> data, or storing ML models.
>> -------------------
>>
>> Please check the detailed proposal [3] and the google document [4], and
>> comment there or reply on the dev list if you have any suggestions.
>>
>> Thanks,
>> Peter
>>
>> [1] - https://github.com/spiraldb/vortex
>> [2] - https://lancedb.github.io/lance/
>> [3] - https://github.com/apache/iceberg/issues/12225
>> [4] -
>> https://docs.google.com/document/d/1sF_d4tFxJsZWsZFCyCL9ZE7YuI7-P3VrzMLIrrTIxds
>>
>>

Reply via email to