Accidentally force-pushed :( The new links are here: - https://github.com/apache/iceberg/pull/12298/commits/583cccb6e036323ee74a74bf3b06a40bf16f8982 - The API Interface classes - https://github.com/apache/iceberg/pull/12298/commits/217e68caa61667032da3d710401078bb50b0a99f - Moving the Parquet/Avro/ORC readers and writers to implement these interfaces - https://github.com/apache/iceberg/pull/12298/commits/7989416718657871760ae010dcb46a92904c1768 - Moving the implementation of the generic readers/writers with the new interfaces - https://github.com/apache/iceberg/pull/12298/commits/6595ccc381d4931bcf04bbdb1db8982c3f450bb4 - Arrow reader implementation with the new interfaces - https://github.com/apache/iceberg/pull/12298/commits/ce9b82aa55bdfddbb4ba3b1230f9f10342adec6d - Spark reader/writer implementation with the new interfaces - https://github.com/apache/iceberg/pull/12298/commits/313c2d59b04db390be09172356d3f5359e6f6d6e - Flink reader/writer implementation with the new interfaces
Péter Váry <peter.vary.apa...@gmail.com> ezt írta (időpont: 2025. febr. 18., K, 10:08): > Hi Renjie, > > Based on your feedback, I have created a PR which separates out the > different logical parts to different commits: > https://github.com/apache/iceberg/pull/12298 > The following parts are separated: > > - > > https://github.com/apache/iceberg/pull/12298/commits/1ad230f67df014b424c3547603831f5e637b96d0 > - The API Interface classes > - > > https://github.com/apache/iceberg/pull/12298/commits/6fa135927676fd080d8322d7d09cf2b86f54de36 > - Moving the Parquet/Avro/ORC readers and writers to implement these > interfaces > - > > https://github.com/apache/iceberg/pull/12298/commits/b6ab3d059732b7c898dd2a385f0cfa8a7956e999 > - Moving the implementation of the generic readers/writers with the new > interfaces > - > > https://github.com/apache/iceberg/pull/12298/commits/aba830a86f535b2d1363b350d5f8b8622b608f1a > - Arrow reader implementation with the new interfaces > - > > https://github.com/apache/iceberg/pull/12298/commits/21179b8d0f7d1f8db3d9ea532d8cc776533b3fdf > - Spark reader/writer implementation with the new interfaces > - > > https://github.com/apache/iceberg/pull/12298/commits/907089c15fb497879ac879ff1d9227fc684d356d > - Flink reader/writer implementation with the new interfaces > > Thanks, > Peter > > > > Péter Váry <peter.vary.apa...@gmail.com> ezt írta (időpont: 2025. febr. > 14., P, 11:30): > >> Hi Renjie, >> Here is the WIP PR for the readers: >> https://github.com/apache/iceberg/pull/12069 >> Here is the WIP PR for the writers: >> https://github.com/apache/iceberg/pull/12164 >> >> If you want to concentrate on the proposed new API, maybe this is the >> best place to start: >> https://github.com/apache/iceberg/compare/main...pvary:iceberg:file_format_api_minimal_few_class >> Thanks, >> Peter >> >> Renjie Liu <liurenjie2...@gmail.com> ezt írta (időpont: 2025. febr. 14., >> P, 11:15): >> >>> Hi, Peter: >>> >>> Thanks for raising this, and this proposal sounds quite interesting to >>> me. >>> >>> I've reviewed the doc but it still seems too abstract to understand, do >>> you mind to submit a pr so that it would be more clear what's changed? >>> >>> On Wed, Feb 12, 2025 at 12:46 AM Péter Váry <peter.vary.apa...@gmail.com> >>> wrote: >>> >>>> Hi Team, >>>> >>>> As mentioned earlier on our Community Sync I am exploring the >>>> possibility to define a FileFormat API for accessing different file >>>> formats. I have put together a proposal based on my findings. >>>> >>>> ------------------- >>>> Iceberg currently supports 3 different file formats: Avro, Parquet, >>>> ORC. With the introduction of Iceberg V3 specification many new features >>>> are added to Iceberg. Some of these features like new column types, default >>>> values require changes at the file format level. The changes are added by >>>> individual developers with different focus on the different file formats. >>>> As a result not all of the features are available for every supported file >>>> format. >>>> Also there are emerging file formats like Vortex [1] or Lance [2] which >>>> either by specialization, or by applying newer research results could >>>> provide better alternatives for certain use-cases like random access for >>>> data, or storing ML models. >>>> ------------------- >>>> >>>> Please check the detailed proposal [3] and the google document [4], and >>>> comment there or reply on the dev list if you have any suggestions. >>>> >>>> Thanks, >>>> Peter >>>> >>>> [1] - https://github.com/spiraldb/vortex >>>> [2] - https://lancedb.github.io/lance/ >>>> [3] - https://github.com/apache/iceberg/issues/12225 >>>> [4] - >>>> https://docs.google.com/document/d/1sF_d4tFxJsZWsZFCyCL9ZE7YuI7-P3VrzMLIrrTIxds >>>> >>>>