I definitely support introducing an API for this purpose and I think that
the current work is the right direction. But I'm not sure that a vote is
the right next step. A vote should be used to confirm consensus on a design
and direction, and I thought the next steps were to build that consensus
around the current API prototype.

I know that a few of us have been trying to tie up loose ends with v3 and
that has meant, for me at least, that I haven't had time to thoroughly
review the last set of changes and the file API proposed. Isn't the next
step to get all reviewers to spend time on this area rather than to try to
decide with a vote? I'll make sure I take the time in the next few weeks to
help with the reviews. I think that building consensus is the right next
step.

Ryan

On Thu, May 15, 2025 at 6:46 AM Péter Váry <peter.vary.apa...@gmail.com>
wrote:

> Hi Team,
>
> We started the discussion of the File Format API proposal [1] a long time
> ago [2].
> Since then - during the review process - we moved from single
> formalization of the similar APIs to bigger changes.
> The lucky ones could see a presentation about the results during the
> Iceberg Summit [3]. The topic was discussed, and generally endorsed during
> "The Future of Apache Iceberg" panel discussion. [4]
>
> The new API still uses direct conversion from the Data File object model
> to the Engine object model, but refactors out many duplicated code parts
> from both the File Format and the engine specific codes.
> As a result we get:
> - Same performance as the current solution
> - Formalized API
> - Simplified code
>     - On engine level
>     - On File Format level
> - Improved testability
> - Ability to introduce/deprecate new File Formats without much disruption
> when the community decides so.
>
> Proposal document contains more details [5], also there is a PR where you
> can check the proposed API changes[6], and a bigger change showing how the
> new API would affect the current File Format and engine implementations [7].
>
> Please consider the proposal and vote.
>
> [ ] +1 Add these changes to Iceberg
> [ ] +0
> [ ] -1 I have questions and/or concerns
>
> Thanks,
> Peter
>
> [1] - Github issue - https://github.com/apache/iceberg/issues/12225
> [2] - Mail list thread -
> https://lists.apache.org/thread/ovyh52m2b6c1hrg4fhw3rx92bzr793n2
> [3] - Turbocharge Queries on Iceberg with Next-Gen File Formats -
> https://www.youtube.com/watch?v=p6ZKY8JViCA&list=PLkifVhhWtccxMcqWlXXFvjJybisFF7ESh&index=40
> [4] - The Future of Apache Iceberg™: A Community Member Panel Discussion -
> https://www.youtube.com/watch?v=BTTxeUXjqk8&list=PLkifVhhWtccxMcqWlXXFvjJybisFF7ESh&index=6
> [5] - Proposal document -
> https://docs.google.com/document/d/1sF_d4tFxJsZWsZFCyCL9ZE7YuI7-P3VrzMLIrrTIxds
> [6] - PR: Core, Data: File Format API interfaces -
> https://github.com/apache/iceberg/pull/12774
> [7] - PR: Core: Interface based DataFile reader and writer API -
> https://github.com/apache/iceberg/pull/12298
>
>

Reply via email to