Hi Team, We started the discussion of the File Format API proposal [1] a long time ago [2]. Since then - during the review process - we moved from single formalization of the similar APIs to bigger changes. The lucky ones could see a presentation about the results during the Iceberg Summit [3]. The topic was discussed, and generally endorsed during "The Future of Apache Iceberg" panel discussion. [4]
The new API still uses direct conversion from the Data File object model to the Engine object model, but refactors out many duplicated code parts from both the File Format and the engine specific codes. As a result we get: - Same performance as the current solution - Formalized API - Simplified code - On engine level - On File Format level - Improved testability - Ability to introduce/deprecate new File Formats without much disruption when the community decides so. Proposal document contains more details [5], also there is a PR where you can check the proposed API changes[6], and a bigger change showing how the new API would affect the current File Format and engine implementations [7]. Please consider the proposal and vote. [ ] +1 Add these changes to Iceberg [ ] +0 [ ] -1 I have questions and/or concerns Thanks, Peter [1] - Github issue - https://github.com/apache/iceberg/issues/12225 [2] - Mail list thread - https://lists.apache.org/thread/ovyh52m2b6c1hrg4fhw3rx92bzr793n2 [3] - Turbocharge Queries on Iceberg with Next-Gen File Formats - https://www.youtube.com/watch?v=p6ZKY8JViCA&list=PLkifVhhWtccxMcqWlXXFvjJybisFF7ESh&index=40 [4] - The Future of Apache Iceberg™: A Community Member Panel Discussion - https://www.youtube.com/watch?v=BTTxeUXjqk8&list=PLkifVhhWtccxMcqWlXXFvjJybisFF7ESh&index=6 [5] - Proposal document - https://docs.google.com/document/d/1sF_d4tFxJsZWsZFCyCL9ZE7YuI7-P3VrzMLIrrTIxds [6] - PR: Core, Data: File Format API interfaces - https://github.com/apache/iceberg/pull/12774 [7] - PR: Core: Interface based DataFile reader and writer API - https://github.com/apache/iceberg/pull/12298