Hi Team,

We started the discussion of the File Format API proposal [1] a long time
ago [2].
Since then - during the review process - we moved from single formalization
of the similar APIs to bigger changes.
The lucky ones could see a presentation about the results during the
Iceberg Summit [3]. The topic was discussed, and generally endorsed during
"The Future of Apache Iceberg" panel discussion. [4]

The new API still uses direct conversion from the Data File object model to
the Engine object model, but refactors out many duplicated code parts from
both the File Format and the engine specific codes.
As a result we get:
- Same performance as the current solution
- Formalized API
- Simplified code
    - On engine level
    - On File Format level
- Improved testability
- Ability to introduce/deprecate new File Formats without much disruption
when the community decides so.

Proposal document contains more details [5], also there is a PR where you
can check the proposed API changes[6], and a bigger change showing how the
new API would affect the current File Format and engine implementations [7].

Please consider the proposal and vote.

[ ] +1 Add these changes to Iceberg
[ ] +0
[ ] -1 I have questions and/or concerns

Thanks,
Peter

[1] - Github issue - https://github.com/apache/iceberg/issues/12225
[2] - Mail list thread -
https://lists.apache.org/thread/ovyh52m2b6c1hrg4fhw3rx92bzr793n2
[3] - Turbocharge Queries on Iceberg with Next-Gen File Formats -
https://www.youtube.com/watch?v=p6ZKY8JViCA&list=PLkifVhhWtccxMcqWlXXFvjJybisFF7ESh&index=40
[4] - The Future of Apache Iceberg™: A Community Member Panel Discussion -
https://www.youtube.com/watch?v=BTTxeUXjqk8&list=PLkifVhhWtccxMcqWlXXFvjJybisFF7ESh&index=6
[5] - Proposal document -
https://docs.google.com/document/d/1sF_d4tFxJsZWsZFCyCL9ZE7YuI7-P3VrzMLIrrTIxds
[6] - PR: Core, Data: File Format API interfaces -
https://github.com/apache/iceberg/pull/12774
[7] - PR: Core: Interface based DataFile reader and writer API -
https://github.com/apache/iceberg/pull/12298

Reply via email to