I would also vote for option 0. This api has clean separation and makes refactoring easier, e.g. when we completely deprecate v2 table, we could mark the *positionDeleteWriteBuilder *method as deprecated, and it would be easier to remove its usage.
On Fri, Sep 12, 2025 at 11:24 PM Russell Spitzer <russell.spit...@gmail.com> wrote: > Now that I fully understand the situation I think option 0 as you've > written is probably the best thing to do as long as PositionDelete is a > class. I > think with hindsight it probably shouldn't have been a class and always > been an interface so that our internal code could produce rows which > implement PositionDelete rather than PositionDeletes that wrap rows. > > On Fri, Sep 12, 2025 at 8:02 AM Péter Váry <peter.vary.apa...@gmail.com> > wrote: > >> Let me summarize the state a bit: >> >> The FileFormat interface needs to expose two distinct methods: >> >> - WriteBuilder<InternalRow> >> - WriteBuilder<PositionDelete<InternalRow>> >> - After the PDWR deprecation this will be >> WriteBuilder<PositionDelete> >> - After V2 deprecation this will be not needed anymore >> >> Based on the file format methods, the Registry must support four builder >> types: >> >> - WriteBuilder<InternalRow> >> - DataWriteBuilder<InternalRow> >> - EqualityDeleteWriteBuilder<InternalRow> >> - PositionDeleteWriteBuilder<InternalRow> >> >> >> *API Design Considerations* >> There is an argument that the two WriteBuilder methods provided by >> FileFormat are essentially the same, differing only in the writerFunction. >> While this is technically correct for current implementations, I believe >> the API should clearly distinguish between the two writer types to >> highlight the differences. >> >> *Discussed Approaches* >> >> *0. Two Explicit Methods on FormatModel* (removed based on previous >> comments, but I personally still prefer this) >> >> *WriteBuilder<InternalRow> writeBuilder(OutputFile outputFile);* >> *WriteBuilder<PositionDelete<InternalRow>> >> positionDeleteWriteBuilder(OutputFile outputFile); * >> >> >> Pros: Clear separation of responsibilities >> >> *1. One Builder + One Converter* >> >> *WriteBuilder<InternalRow> writeBuilder(OutputFile outputFile);* >> *Function<PositionDelete<D>, D> positionDeleteConverter(Schema schema);* >> >> >> Pros: Keeps the interface compact >> Cons: Requires additional documentation and understanding why the >> conversion logic is needed >> >> *2. Single Method with Javadoc Clarification* (most similar to the >> current approach) >> >> *WriteBuilder writeBuilder(OutputFile outputFile); * >> >> >> Pros: Minimalistic >> Cons: Least explicit; relies entirely on documentation >> >> *2/b. Single Builder with Type Parameter *(based on Russell's suggestion) >> >> *WriteBuilder writeBuilder(OutputFile outputFile);* >> *// Usage: builder.build(Class<D> inputType)* >> >> >> Pros: Flexible >> Cons: Relies on documentation to clarify the available input types >> >> *Bonus* >> Options 0 and 1 make it easier to phase out PositionDelete filtering once >> V2 tables are deprecated. >> >> Thanks, >> Peter >> >> Péter Váry <peter.vary.apa...@gmail.com> ezt írta (időpont: 2025. szept. >> 11., Cs, 18:36): >> >>> > Wouldn't PositionDelete<InternalRow> also be an InternalRow in this >>> example? I think that's what i'm confused about. >>> >>> With the *second approach*, the WriteBuilder doesn’t need to handle >>> PositionDelete objects directly. The conversion layer takes care of >>> that, so the WriteBuilder only needs to work with InternalRow. >>> >>> With the *first approach*, we shift that responsibility to the >>> WriteBuilder, which then has to support both InternalRow and >>> PositionDelete<InternalRow>. >>> >>> In both cases, the FormatModelRegistry API will still expose the more >>> concrete types (PositionDelete / InternalRow). However, under the *first >>> approach*, the lower-level API only needs to handle InternalRow, >>> simplifying its interface. >>> Thanks, >>> Peter >>> >>> Russell Spitzer <russell.spit...@gmail.com> ezt írta (időpont: 2025. >>> szept. 11., Cs, 17:12): >>> >>>> Wouldn't PositionDelete<InternalRow> also be an InternalRow in this >>>> example? I think that's what i'm confused about. >>>> >>>> On Thu, Sep 11, 2025 at 5:35 AM Péter Váry <peter.vary.apa...@gmail.com> >>>> wrote: >>>> >>>>> Thanks, Russell, for taking a look at this! >>>>> >>>>> We need to expose four methods on the user-facing API ( >>>>> FormatModelRegistry): >>>>> >>>>> 1. *writeBuilder* – for writing arbitrary files without Iceberg >>>>> metadata. In the Iceberg codebase, this is exposed via >>>>> FlinkAppenderFactory and >>>>> the GenericAppenderFactory for creating FileAppender<RowData> and >>>>> FileAppender<Record> only. >>>>> 2. *dataWriteBuilder* – for creating and collecting metadata for >>>>> Iceberg DataFiles. >>>>> 3. *equalityDeleteWriteBuilder* – for creating and collecting >>>>> metadata for Iceberg EqualityDeleteFiles. >>>>> 4. *positionDeleteWriteBuilder* – for creating and collecting >>>>> metadata for Iceberg PositionDeleteFiles. >>>>> >>>>> We’d like to implement all four using a single WriteBuilder created >>>>> by the FormatModels. >>>>> >>>>> Your suggestion is a good one—it helps formalize the requirements for >>>>> the build method and also surfaces an important design question: >>>>> >>>>> *Who should be responsible for handling the differences between normal >>>>> rows (InternalRow) and position deletes (PositionDelete<InternalRow>)* >>>>> ? >>>>> >>>>> - Should we have a more complex WriteBuilder class that can create >>>>> both DataFileAppender and PositionDeleteAppender? >>>>> - Or should we push this responsibility to the engine-specific >>>>> code, where we already have some logic (e.g., pathTransformFunc) >>>>> needed by each engine to create the PositionDeleteAppender? >>>>> >>>>> Thanks, >>>>> Peter >>>>> >>>>> >>>>> Russell Spitzer <russell.spit...@gmail.com> ezt írta (időpont: 2025. >>>>> szept. 11., Cs, 0:11): >>>>> >>>>>> I'm a little confused here, I think Ryan mentioned this in the >>>>>> comment here >>>>>> https://github.com/apache/iceberg/pull/12774/files#r2254967177 >>>>>> >>>>>> From my understanding there are two options? >>>>>> >>>>>> 1) We either are producing FormatModels that take a generic row type >>>>>> D and produce writers that all take D and write files. >>>>>> >>>>>> 2) we are creating IcebergModel specific writers that take DataFile, >>>>>> PositionDeleteFile, EqualityDeleteFile etc ... and write files >>>>>> >>>>>> The PositionDelete Converter issue seems to stem from attempting to >>>>>> do both model 1 (being very generic) and 2, wanting special code to deal >>>>>> with PositionDeleteFile<R> objects. >>>>>> >>>>>> It looks like the code in #12774 is mostly doing model 1, but we are >>>>>> trying to add in a specific converter for 2? >>>>>> >>>>>> Maybe i'm totally lost here but I was assuming we would do >>>>>> something a little scala-y like >>>>>> >>>>>> public <T> FileAppender<T> build(Class<T> type) { >>>>>> if (type == DataFile.class) return (FileAppender<T>) new >>>>>> DataFileAppender(); >>>>>> if (type == DeleteFile.class) return (FileAppender<T>) new >>>>>> DeleteFileAppender(); >>>>>> // ... >>>>>> } >>>>>> >>>>>> >>>>>> So that we only register a single signature and if writer specific >>>>>> implementation needs to do something special it can? I'm trying to catch >>>>>> back up to speed on this PR so it may help to do a quick summary of the >>>>>> current state and intent. (At least for me) >>>>>> >>>>>> On Tue, Sep 9, 2025 at 3:42 AM Péter Váry < >>>>>> peter.vary.apa...@gmail.com> wrote: >>>>>> >>>>>>> Hi Renjie, >>>>>>> Thanks for taking a look! >>>>>>> >>>>>>> Let me clarify a few points: >>>>>>> - The converter API is only required for writing position delete >>>>>>> files for V2 tables >>>>>>> - Currently, there are no plans to support vectorized writing via >>>>>>> the java API >>>>>>> - Even if we decide to support vectorized writes, I don't think we >>>>>>> would like to implement it for Positional Deletes, which are deprecated >>>>>>> in >>>>>>> the new spec. >>>>>>> - Also, once the positional deletes - which contain the deleted rows >>>>>>> - are deprecated (as planned), the conversion of the Position Deletes >>>>>>> with >>>>>>> only file name and position would be trivial, even for the vectorized >>>>>>> writes. >>>>>>> >>>>>>> So from my perspective, the converter method exists purely for >>>>>>> backward compatibility, and we intend to remove it as soon as possible. >>>>>>> Sacrificing good practices for the sake of a deprecated feature doesn’t >>>>>>> seem worthwhile to me. >>>>>>> >>>>>>> Thanks, >>>>>>> Peter >>>>>>> >>>>>>> Renjie Liu <liurenjie2...@gmail.com> ezt írta (időpont: 2025. >>>>>>> szept. 8., H, 12:34): >>>>>>> >>>>>>>> Hi, Peter: >>>>>>>> >>>>>>>> I would vote for the first approach. In spite of the compromises >>>>>>>> described, the api is still cleaner. Also I think there are some >>>>>>>> problems >>>>>>>> with the converter api. For example, for vectorized implementations >>>>>>>> such as >>>>>>>> comet which accepts columnar batch rather than rows, the converter >>>>>>>> method >>>>>>>> would make things more complicated. >>>>>>>> >>>>>>>> On Sat, Aug 30, 2025 at 2:49 PM Péter Váry < >>>>>>>> peter.vary.apa...@gmail.com> wrote: >>>>>>>> >>>>>>>>> I’ve initiated a discussion thread regarding the deprecation of >>>>>>>>> Position Deletes containing row data. You can follow it here: >>>>>>>>> https://lists.apache.org/thread/8jw6pb2vq3ghmdqf1yvy8n5n6gg1fq5s >>>>>>>>> >>>>>>>>> We can proceed with the discussion about the native reader/writer >>>>>>>>> deprecation when we decided on the final API, as the chosen design may >>>>>>>>> influence our approach. >>>>>>>>> >>>>>>>>> Since then, one more question has come up - hopefully the last: >>>>>>>>> *How should we handle Position Delete Writers?* >>>>>>>>> The File Format API should return builders for either rows or >>>>>>>>> PositionDelete objects. Currently the method >>>>>>>>> `WriteBuilder.createWriterFunc(Function<MessageType, >>>>>>>>> ParquetValueWriter<?>>)` defines the accepted input parameters for the >>>>>>>>> writer. Users are responsible for ensuring that the writer function >>>>>>>>> and the >>>>>>>>> return type of the `WriteBuilder.build()` are compatible. In the new >>>>>>>>> API, >>>>>>>>> we no longer expose writer functions. We still expose FileContent, >>>>>>>>> since >>>>>>>>> writer configurations vary by content type, but we don’t expose the >>>>>>>>> types. >>>>>>>>> >>>>>>>>> There are two proposals for handling types for the WriteBuilders: >>>>>>>>> >>>>>>>>> 1. *Implicit Type Definition via FileContent* - the builder >>>>>>>>> parameter for FileContent would implicitly define the input type >>>>>>>>> for the >>>>>>>>> writer returned by build(), or >>>>>>>>> 2. *Engine level conversion* - Engines would convert >>>>>>>>> PositionDelete objects to their native types. >>>>>>>>> >>>>>>>>> In code: >>>>>>>>> >>>>>>>>> - In the 1st proposal, the FormatModel.writeBuilder(OutputFile >>>>>>>>> outputFile) can return anything: >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> * WriteBuilder builder = >>>>>>>>> FormatModelRegistry.writeBuilder(PARQUET, InternalRow.class, >>>>>>>>> outputFile); >>>>>>>>> FileAppender<InternalRow> appender = >>>>>>>>> .schema(table.schema()) >>>>>>>>> .content(FileContent.DATA) .... .build(); * >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> * // Exposed, but FormatModelRegistry.positionDeleteWriteBuilder >>>>>>>>> should be used instead WriteBuilder builder >>>>>>>>> = FormatModelRegistry.writeBuilder(PARQUET, InternalRow.class, >>>>>>>>> outputFile); >>>>>>>>> FileAppender<PositionDelete<InternalRow>> appender = >>>>>>>>> .schema(table.schema()) >>>>>>>>> .content(FileContent.POSITION_DELETES) >>>>>>>>> .... .build();* >>>>>>>>> >>>>>>>>> >>>>>>>>> - In the 2nd proposal, the FormatModel needs another method: >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> *Function<PositionDelete<D>, D> positionDeleteConverter(Schema >>>>>>>>> schema); *example implementation: >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> * return delete -> { deleteRecord.update(0, >>>>>>>>> UTF8String.fromString(delete.path().toString())); >>>>>>>>> deleteRecord.update(1, delete.pos()); deleteRecord.update(2, >>>>>>>>> delete.row()); return deleteRecord; };* >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> * // Content is only used for writer property configuration >>>>>>>>> WriteBuilder<InternalRow> builder = >>>>>>>>> sparkFormatModel.writeBuilder(outputFile); >>>>>>>>> FileAppender<InternalRow> appender = >>>>>>>>> .schema(table.schema()) >>>>>>>>> .content(FileContent.DATA) .... .build();* >>>>>>>>> >>>>>>>>> >>>>>>>>> Drawbacks >>>>>>>>> >>>>>>>>> - Proposal 1: >>>>>>>>> - Type checking for the FileAppenders occurs only at >>>>>>>>> runtime, so user errors surface late. >>>>>>>>> - File Format specification must clearly specify which >>>>>>>>> builder type corresponds to which file content >>>>>>>>> parameter—generics would >>>>>>>>> offer better clarity. >>>>>>>>> - Inconsistent patterns between WriteBuilder and >>>>>>>>> ReadBuilder, as the latter can define output types via generics. >>>>>>>>> - Proposal 2: >>>>>>>>> - Requires FormatModels to implement a converter method to >>>>>>>>> transform PositionDelete<InternalRow> into InternalRow. >>>>>>>>> >>>>>>>>> Since we deprecated writing position delete files in the V3 spec, >>>>>>>>> this extra method in the 2nd proposal will be deprecated too. As a >>>>>>>>> result, >>>>>>>>> in the long run, we will have a nice, clean API. >>>>>>>>> OTOH, if we accept the compromise described in the 1st proposal, >>>>>>>>> the results of our decision will remain, even when the functions are >>>>>>>>> removed. >>>>>>>>> >>>>>>>>> Looking forward to your thoughts. >>>>>>>>> Thanks, Peter >>>>>>>>> >>>>>>>>> On Thu, Aug 14, 2025, 14:12 Péter Váry < >>>>>>>>> peter.vary.apa...@gmail.com> wrote: >>>>>>>>> >>>>>>>>>> Hi Team, >>>>>>>>>> >>>>>>>>>> During yesterday’s community sync, we discussed the current state >>>>>>>>>> of the File Format API proposal and identified two key questions that >>>>>>>>>> require input from the broader community: >>>>>>>>>> >>>>>>>>>> *1. Dropping support for Position Delete files with Row Data* >>>>>>>>>> >>>>>>>>>> The current Iceberg V2 spec [1] defines two types of position >>>>>>>>>> delete files: >>>>>>>>>> >>>>>>>>>> - Files that store only the file name and row position. >>>>>>>>>> - Files that also store the deleted row data. >>>>>>>>>> >>>>>>>>>> Although this feature is defined in the spec and some tests exist >>>>>>>>>> in the Iceberg codebase, we’re not aware of any actual >>>>>>>>>> implementation using >>>>>>>>>> the second type (with row data). Supporting V2 table writing via the >>>>>>>>>> new >>>>>>>>>> File Format API would be simpler if we dropped support for this >>>>>>>>>> feature. >>>>>>>>>> If you know of any use case or reason to retain support for >>>>>>>>>> position deletes with row data, please let us know. >>>>>>>>>> >>>>>>>>>> *2. Deprecating Native File Format Readers/Writers in the API* >>>>>>>>>> >>>>>>>>>> The current API contains format-specific readers/writers for >>>>>>>>>> Parquet, Avro, and ORC. With the introduction of the InternalData >>>>>>>>>> and File >>>>>>>>>> Format APIs, Iceberg users can now write files using: >>>>>>>>>> >>>>>>>>>> - InternalData API for metadata files (manifest, manifest >>>>>>>>>> list, partition stats). >>>>>>>>>> - File Format API for data and delete files. >>>>>>>>>> >>>>>>>>>> I propose we deprecate the original format-specific writers and >>>>>>>>>> guide users to use the new APIs based on the target file type. If >>>>>>>>>> you’re >>>>>>>>>> aware of any use cases that still require the original >>>>>>>>>> format-specific >>>>>>>>>> writers, please share them. >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> Peter >>>>>>>>>> >>>>>>>>>> [1] - Position Delete File Spec: >>>>>>>>>> https://iceberg.apache.org/spec/?h=delete#position-delete-files >>>>>>>>>> >>>>>>>>>> Péter Váry <peter.vary.apa...@gmail.com> ezt írta (időpont: >>>>>>>>>> 2025. júl. 22., K, 16:09): >>>>>>>>>> >>>>>>>>>>> Also put together a solution where the Engine specific format >>>>>>>>>>> transformation is separated from the writer, and the engines need >>>>>>>>>>> to take >>>>>>>>>>> care of it separately. >>>>>>>>>>> This is somewhat complicated on the implementation side (see: >>>>>>>>>>> [RowDataTransformer]( >>>>>>>>>>> https://github.com/apache/iceberg/pull/12298/files#diff-562fa4cc369c908a157f59a9235fd3f389096451e7901686fba37c87b53dee08), >>>>>>>>>>> and [InternalRowTransformer]( >>>>>>>>>>> https://github.com/apache/iceberg/pull/12298/files#diff-546f9dc30e3207d1d2bc0a2722976b55f5a04dcf85a22855e4f400500c317140)), >>>>>>>>>>> but simplifies the API. >>>>>>>>>>> >>>>>>>>>>> @rdblue: Please check the proposed solution. I think this is >>>>>>>>>>> what you have suggested >>>>>>>>>>> >>>>>>>>>>> Péter Váry <peter.vary.apa...@gmail.com> ezt írta (időpont: >>>>>>>>>>> 2025. jún. 30., H, 18:42): >>>>>>>>>>> >>>>>>>>>>>> During the PR review [1], we began exploring what could we use >>>>>>>>>>>> as an intermediate layer to reduce the need for engines and file >>>>>>>>>>>> formats to >>>>>>>>>>>> implement the full matrix of file format - object model >>>>>>>>>>>> conversions. >>>>>>>>>>>> >>>>>>>>>>>> To support this discussion, I’ve created and run a set of >>>>>>>>>>>> performance benchmarks and compiled a document outlining the >>>>>>>>>>>> potential >>>>>>>>>>>> benefits and trade-offs [2]. >>>>>>>>>>>> >>>>>>>>>>>> Feedback is welcome, feel free to comment on the document, the >>>>>>>>>>>> PR, or directly in this thread. >>>>>>>>>>>> >>>>>>>>>>>> Thanks, >>>>>>>>>>>> Peter >>>>>>>>>>>> >>>>>>>>>>>> [1] - PR discussion - >>>>>>>>>>>> https://github.com/apache/iceberg/pull/12774#discussion_r2093626096 >>>>>>>>>>>> [2] - File Format and engine object model transformation >>>>>>>>>>>> performance - >>>>>>>>>>>> https://docs.google.com/document/d/1GdA8IowKMtS3QVdm8s-0X-ZRYetcHv2bhQ9mrSd3fd4 >>>>>>>>>>>> >>>>>>>>>>>> Péter Váry <peter.vary.apa...@gmail.com> ezt írta (időpont: >>>>>>>>>>>> 2025. máj. 7., Sze, 13:15): >>>>>>>>>>>> >>>>>>>>>>>>> Hi everyone, >>>>>>>>>>>>> The proposed API part is reviewed and ready to go. See: >>>>>>>>>>>>> https://github.com/apache/iceberg/pull/12774 >>>>>>>>>>>>> Thanks to everyone who reviewed it already! >>>>>>>>>>>>> >>>>>>>>>>>>> Many of you wanted to review, but I know that the time >>>>>>>>>>>>> constraints are there for everyone. I still very much would like >>>>>>>>>>>>> to hear >>>>>>>>>>>>> your voices, so I will not merge the PR this week. Please review >>>>>>>>>>>>> it if you. >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks, >>>>>>>>>>>>> Peter >>>>>>>>>>>>> >>>>>>>>>>>>> Péter Váry <peter.vary.apa...@gmail.com> ezt írta (időpont: >>>>>>>>>>>>> 2025. ápr. 16., Sze, 7:02): >>>>>>>>>>>>> >>>>>>>>>>>>>> Hi Renjie, >>>>>>>>>>>>>> The first one for the proposed new API is here: >>>>>>>>>>>>>> https://github.com/apache/iceberg/pull/12774 >>>>>>>>>>>>>> Thanks, Peter >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Wed, Apr 16, 2025, 05:40 Renjie Liu < >>>>>>>>>>>>>> liurenjie2...@gmail.com> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Hi, Peter: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Thanks for the effort. I totally agree with splitting them >>>>>>>>>>>>>>> into smaller prs to move forward. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I'm quite interested in this topic, and please ping me in >>>>>>>>>>>>>>> those splitted prs and I'll help to review. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Mon, Apr 14, 2025 at 11:22 PM Jean-Baptiste Onofré < >>>>>>>>>>>>>>> j...@nanthrax.net> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Hi Peter >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Awesome ! Thank you so much ! >>>>>>>>>>>>>>>> I will do a new pass. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Regards >>>>>>>>>>>>>>>> JB >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Fri, Apr 11, 2025 at 3:48 PM Péter Váry < >>>>>>>>>>>>>>>> peter.vary.apa...@gmail.com> wrote: >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> > Hi JB, >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> > Separated out the proposed interfaces to a new PR: >>>>>>>>>>>>>>>> https://github.com/apache/iceberg/pull/12774. >>>>>>>>>>>>>>>> > Reviewers can check that out if they are only interested >>>>>>>>>>>>>>>> in how the new API would look like. >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> > Thanks, >>>>>>>>>>>>>>>> > Peter >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> > Jean-Baptiste Onofré <j...@nanthrax.net> ezt írta >>>>>>>>>>>>>>>> (időpont: 2025. ápr. 10., Cs, 18:25): >>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>> >> Hi Peter >>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>> >> Thanks for the ping about the PR. >>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>> >> Maybe, to facilitate the review and move forward faster, >>>>>>>>>>>>>>>> we should >>>>>>>>>>>>>>>> >> split the PR in smaller PRs: >>>>>>>>>>>>>>>> >> - one with the interfaces (ReadBuilder, AppenderBuilder, >>>>>>>>>>>>>>>> ObjectModel, >>>>>>>>>>>>>>>> >> AppenderBuilder, DataWriterBuilder, ...) >>>>>>>>>>>>>>>> >> - one for each file providers (Parquet, Avro, ORC) >>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>> >> Thoughts ? I can help on the split if needed. >>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>> >> Regards >>>>>>>>>>>>>>>> >> JB >>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>> >> On Thu, Apr 10, 2025 at 5:16 AM Péter Váry < >>>>>>>>>>>>>>>> peter.vary.apa...@gmail.com> wrote: >>>>>>>>>>>>>>>> >> > >>>>>>>>>>>>>>>> >> > Since the 1.9.0 release candidate has been created, I >>>>>>>>>>>>>>>> would like to resurrect this PR: >>>>>>>>>>>>>>>> https://github.com/apache/iceberg/pull/12298 to ensure >>>>>>>>>>>>>>>> that we have as long a testing period as possible for it. >>>>>>>>>>>>>>>> >> > >>>>>>>>>>>>>>>> >> > To recap, here is what the PR does after the review >>>>>>>>>>>>>>>> rounds: >>>>>>>>>>>>>>>> >> > >>>>>>>>>>>>>>>> >> > Created 3 interface classes which are implemented by >>>>>>>>>>>>>>>> the file formats: >>>>>>>>>>>>>>>> >> > >>>>>>>>>>>>>>>> >> > ReadBuilder - Builder for reading data from data files >>>>>>>>>>>>>>>> >> > AppenderBuilder - Builder for writing data to data >>>>>>>>>>>>>>>> files >>>>>>>>>>>>>>>> >> > ObjectModel - Providing ReadBuilders, and >>>>>>>>>>>>>>>> AppenderBuilders for the specific data file format and object >>>>>>>>>>>>>>>> model pair >>>>>>>>>>>>>>>> >> > >>>>>>>>>>>>>>>> >> > Updated the Parquet, Avro, ORC implementation for this >>>>>>>>>>>>>>>> interfaces, and deprecated the old reader/writer APIs >>>>>>>>>>>>>>>> >> > Created interface classes which will be used by the >>>>>>>>>>>>>>>> actual readers/writers of the data files: >>>>>>>>>>>>>>>> >> > >>>>>>>>>>>>>>>> >> > AppenderBuilder - Builder for writing a file >>>>>>>>>>>>>>>> >> > DataWriterBuilder - Builder for generating a data file >>>>>>>>>>>>>>>> >> > PositionDeleteWriterBuilder - Builder for generating a >>>>>>>>>>>>>>>> position delete file >>>>>>>>>>>>>>>> >> > EqualityDeleteWriterBuilder - Builder for generating >>>>>>>>>>>>>>>> an equality delete file >>>>>>>>>>>>>>>> >> > No ReadBuilder here - the file format reader builder >>>>>>>>>>>>>>>> is reused >>>>>>>>>>>>>>>> >> > >>>>>>>>>>>>>>>> >> > Created a WriterBuilder class which implements the >>>>>>>>>>>>>>>> interfaces above >>>>>>>>>>>>>>>> (AppenderBuilder/DataWriterBuilder/PositionDeleteWriterBuilder/EqualityDeleteWriterBuilder) >>>>>>>>>>>>>>>> based on a provided file format specific AppenderBuilder >>>>>>>>>>>>>>>> >> > Created an ObjectModelRegistry which stores the >>>>>>>>>>>>>>>> available ObjectModels, and engines and users could request >>>>>>>>>>>>>>>> the readers >>>>>>>>>>>>>>>> (ReadBuilder) and writers >>>>>>>>>>>>>>>> (AppenderBuilder/DataWriterBuilder/PositionDeleteWriterBuilder/EqualityDeleteWriterBuilder) >>>>>>>>>>>>>>>> from. >>>>>>>>>>>>>>>> >> > Created the appropriate ObjectModels: >>>>>>>>>>>>>>>> >> > >>>>>>>>>>>>>>>> >> > GenericObjectModels - for reading and writing Iceberg >>>>>>>>>>>>>>>> Records >>>>>>>>>>>>>>>> >> > SparkObjectModels - for reading (vectorized and >>>>>>>>>>>>>>>> non-vectorized) and writing Spark InternalRow/ColumnarBatch >>>>>>>>>>>>>>>> objects >>>>>>>>>>>>>>>> >> > FlinkObjectModels - for reading and writing Flink >>>>>>>>>>>>>>>> RowData objects >>>>>>>>>>>>>>>> >> > An arrow object model is also registered for >>>>>>>>>>>>>>>> vectorized reads of Parquet files into Arrow ColumnarBatch >>>>>>>>>>>>>>>> objects >>>>>>>>>>>>>>>> >> > >>>>>>>>>>>>>>>> >> > Updated the production code where the reading and >>>>>>>>>>>>>>>> writing happens to use the ObjectModelRegistry and the new >>>>>>>>>>>>>>>> reader/writer >>>>>>>>>>>>>>>> interfaces to access data files >>>>>>>>>>>>>>>> >> > Kept the testing code intact to ensure that the new >>>>>>>>>>>>>>>> API/code is not breaking anything >>>>>>>>>>>>>>>> >> > >>>>>>>>>>>>>>>> >> > The original change was not small, and grew >>>>>>>>>>>>>>>> substantially during the review rounds. So if you have >>>>>>>>>>>>>>>> questions, or I can >>>>>>>>>>>>>>>> do anything to make the review easier, don't hesitate to ask. >>>>>>>>>>>>>>>> I am happy to >>>>>>>>>>>>>>>> do anything to move this forward. >>>>>>>>>>>>>>>> >> > >>>>>>>>>>>>>>>> >> > Thanks, >>>>>>>>>>>>>>>> >> > Peter >>>>>>>>>>>>>>>> >> > >>>>>>>>>>>>>>>> >> > Péter Váry <peter.vary.apa...@gmail.com> ezt írta >>>>>>>>>>>>>>>> (időpont: 2025. márc. 26., Sze, 14:54): >>>>>>>>>>>>>>>> >> >> >>>>>>>>>>>>>>>> >> >> Hi everyone, >>>>>>>>>>>>>>>> >> >> >>>>>>>>>>>>>>>> >> >> I have updated the File Format API PR ( >>>>>>>>>>>>>>>> https://github.com/apache/iceberg/pull/12298) based on the >>>>>>>>>>>>>>>> answers and review comments. >>>>>>>>>>>>>>>> >> >> >>>>>>>>>>>>>>>> >> >> I would like to merge this only after the 1.9.0 >>>>>>>>>>>>>>>> release so we have more time finding any issues and solving >>>>>>>>>>>>>>>> them before >>>>>>>>>>>>>>>> this goes to a release for the users. >>>>>>>>>>>>>>>> >> >> >>>>>>>>>>>>>>>> >> >> For this I have updated the deprecation comments >>>>>>>>>>>>>>>> accordingly. >>>>>>>>>>>>>>>> >> >> I would like to ask you to review the PR, so we iron >>>>>>>>>>>>>>>> out any possible requested changes and be ready for the merge >>>>>>>>>>>>>>>> as soon as >>>>>>>>>>>>>>>> possible after the 1.9.0 release. >>>>>>>>>>>>>>>> >> >> >>>>>>>>>>>>>>>> >> >> Thanks, >>>>>>>>>>>>>>>> >> >> Peter >>>>>>>>>>>>>>>> >> >> >>>>>>>>>>>>>>>> >> >> Péter Váry <peter.vary.apa...@gmail.com> ezt írta >>>>>>>>>>>>>>>> (időpont: 2025. márc. 21., P, 14:32): >>>>>>>>>>>>>>>> >> >>> >>>>>>>>>>>>>>>> >> >>> Hi Renije, >>>>>>>>>>>>>>>> >> >>> >>>>>>>>>>>>>>>> >> >>> > 1. File format filters >>>>>>>>>>>>>>>> >> >>> > >>>>>>>>>>>>>>>> >> >>> > Do the filters include both filter expressions >>>>>>>>>>>>>>>> from both user query and delete filter? >>>>>>>>>>>>>>>> >> >>> >>>>>>>>>>>>>>>> >> >>> The current discussion is about the filters from the >>>>>>>>>>>>>>>> user query. >>>>>>>>>>>>>>>> >> >>> >>>>>>>>>>>>>>>> >> >>> About the delete filter: >>>>>>>>>>>>>>>> >> >>> Based on the suggestions on the PR, I have moved the >>>>>>>>>>>>>>>> delete filter out from the main API. Created a >>>>>>>>>>>>>>>> `SupportsDeleteFilter` >>>>>>>>>>>>>>>> interface for it which would allow pushing down to the filter >>>>>>>>>>>>>>>> to Parquet >>>>>>>>>>>>>>>> vectorized readers in Spark, as this is the only place where >>>>>>>>>>>>>>>> we currently >>>>>>>>>>>>>>>> implemented this feature. >>>>>>>>>>>>>>>> >> >>> >>>>>>>>>>>>>>>> >> >>> >>>>>>>>>>>>>>>> >> >>> Renjie Liu <liurenjie2...@gmail.com> ezt írta >>>>>>>>>>>>>>>> (időpont: 2025. márc. 21., P, 14:11): >>>>>>>>>>>>>>>> >> >>>> >>>>>>>>>>>>>>>> >> >>>> Hi, Peter: >>>>>>>>>>>>>>>> >> >>>> >>>>>>>>>>>>>>>> >> >>>> Thanks for the effort on this. >>>>>>>>>>>>>>>> >> >>>> >>>>>>>>>>>>>>>> >> >>>> 1. File format filters >>>>>>>>>>>>>>>> >> >>>> >>>>>>>>>>>>>>>> >> >>>> Do the filters include both filter expressions from >>>>>>>>>>>>>>>> both user query and delete filter? >>>>>>>>>>>>>>>> >> >>>> >>>>>>>>>>>>>>>> >> >>>> For filters from user query, I agree with you that >>>>>>>>>>>>>>>> we should keep the current behavior. >>>>>>>>>>>>>>>> >> >>>> >>>>>>>>>>>>>>>> >> >>>> For delete filters associated with data files, at >>>>>>>>>>>>>>>> first I thought file format readers should not care about >>>>>>>>>>>>>>>> this. But now I >>>>>>>>>>>>>>>> realized that maybe we need to also push it to file reader, >>>>>>>>>>>>>>>> this is useful >>>>>>>>>>>>>>>> when `IS_DELETED` metadata column is not necessary and we >>>>>>>>>>>>>>>> could use these >>>>>>>>>>>>>>>> filters (position deletes, etc) to further prune data. >>>>>>>>>>>>>>>> >> >>>> >>>>>>>>>>>>>>>> >> >>>> But anyway, I agree that we could postpone it in >>>>>>>>>>>>>>>> follow up pr. >>>>>>>>>>>>>>>> >> >>>> >>>>>>>>>>>>>>>> >> >>>> 2. Batch size configuration >>>>>>>>>>>>>>>> >> >>>> >>>>>>>>>>>>>>>> >> >>>> I'm leaning toward option 2. >>>>>>>>>>>>>>>> >> >>>> >>>>>>>>>>>>>>>> >> >>>> 3. Spark configuration >>>>>>>>>>>>>>>> >> >>>> >>>>>>>>>>>>>>>> >> >>>> I'm leaning towards using different configuration >>>>>>>>>>>>>>>> objects. >>>>>>>>>>>>>>>> >> >>>> >>>>>>>>>>>>>>>> >> >>>> >>>>>>>>>>>>>>>> >> >>>> >>>>>>>>>>>>>>>> >> >>>> On Thu, Mar 20, 2025 at 10:23 PM Péter Váry < >>>>>>>>>>>>>>>> peter.vary.apa...@gmail.com> wrote: >>>>>>>>>>>>>>>> >> >>>>> >>>>>>>>>>>>>>>> >> >>>>> Hi Team, >>>>>>>>>>>>>>>> >> >>>>> Thanks everyone for the reviews on >>>>>>>>>>>>>>>> https://github.com/apache/iceberg/pull/12298! >>>>>>>>>>>>>>>> >> >>>>> I have addressed most of comments, but a few >>>>>>>>>>>>>>>> questions still remain which might merit a bit wider audience: >>>>>>>>>>>>>>>> >> >>>>> >>>>>>>>>>>>>>>> >> >>>>> We should decide on the expected filtering >>>>>>>>>>>>>>>> behavior when the filters are pushed down to the readers. >>>>>>>>>>>>>>>> Currently the >>>>>>>>>>>>>>>> filters are applied as best effort for the file format >>>>>>>>>>>>>>>> readers. Some >>>>>>>>>>>>>>>> readers (Avro) just skip them altogether. There was a >>>>>>>>>>>>>>>> suggestion on the PR >>>>>>>>>>>>>>>> that we might enforce more strict requirements and the readers >>>>>>>>>>>>>>>> either >>>>>>>>>>>>>>>> reject part of the filters, or they could apply them fully. >>>>>>>>>>>>>>>> >> >>>>> Batch sizes are currently parameters for the >>>>>>>>>>>>>>>> reader builders which could be set for non-vectorized readers >>>>>>>>>>>>>>>> too which >>>>>>>>>>>>>>>> could be confusing. >>>>>>>>>>>>>>>> >> >>>>> Currently the Spark batch reader uses different >>>>>>>>>>>>>>>> configuration objects for ParquetBatchReadConf and >>>>>>>>>>>>>>>> OrcBatchReadConf as >>>>>>>>>>>>>>>> requested by the reviewers of the Comet PR. There was a >>>>>>>>>>>>>>>> suggestion on the >>>>>>>>>>>>>>>> current PR to use a common configuration instead. >>>>>>>>>>>>>>>> >> >>>>> >>>>>>>>>>>>>>>> >> >>>>> I would be interested in hearing your thoughts >>>>>>>>>>>>>>>> about these topics. >>>>>>>>>>>>>>>> >> >>>>> >>>>>>>>>>>>>>>> >> >>>>> My current take: >>>>>>>>>>>>>>>> >> >>>>> >>>>>>>>>>>>>>>> >> >>>>> File format filters: I am leaning towards keeping >>>>>>>>>>>>>>>> the current laninet behavior. Especially since Bloom filters >>>>>>>>>>>>>>>> are not able >>>>>>>>>>>>>>>> to do a full filtering, and are often used as a way to filter >>>>>>>>>>>>>>>> out unwanted >>>>>>>>>>>>>>>> records. Another option would be to implement a secondary >>>>>>>>>>>>>>>> filtering inside >>>>>>>>>>>>>>>> the file formats themselves which I think would cause extra >>>>>>>>>>>>>>>> complexity, and >>>>>>>>>>>>>>>> possible code duplication. Whatever the decision here, I would >>>>>>>>>>>>>>>> suggest >>>>>>>>>>>>>>>> moving this out to a next PR as the current changeset is big >>>>>>>>>>>>>>>> enough as it >>>>>>>>>>>>>>>> is. >>>>>>>>>>>>>>>> >> >>>>> Batch size configuration: Currently this is the >>>>>>>>>>>>>>>> only property which is different in the batch readers and the >>>>>>>>>>>>>>>> non-vectorized readers. I see 3 possible solutions: >>>>>>>>>>>>>>>> >> >>>>> >>>>>>>>>>>>>>>> >> >>>>> Create different builders for vectorized and >>>>>>>>>>>>>>>> non-vectorized reads - I don't think the current solution is >>>>>>>>>>>>>>>> confusing >>>>>>>>>>>>>>>> enough to worth the extra class >>>>>>>>>>>>>>>> >> >>>>> We could put this to the reader configuration >>>>>>>>>>>>>>>> property set - This could work, but "hide" the possible >>>>>>>>>>>>>>>> configuration mode >>>>>>>>>>>>>>>> which is valid for both Parquet and ORC readers >>>>>>>>>>>>>>>> >> >>>>> We could keep things as it is now - I would chose >>>>>>>>>>>>>>>> this one, but I don't have a strong opinion here >>>>>>>>>>>>>>>> >> >>>>> >>>>>>>>>>>>>>>> >> >>>>> Spark configuration: TBH, I'm open to bot solution >>>>>>>>>>>>>>>> and happy to move to the direction the community decides on >>>>>>>>>>>>>>>> >> >>>>> >>>>>>>>>>>>>>>> >> >>>>> Thanks, >>>>>>>>>>>>>>>> >> >>>>> Peter >>>>>>>>>>>>>>>> >> >>>>> >>>>>>>>>>>>>>>> >> >>>>> Jean-Baptiste Onofré <j...@nanthrax.net> ezt írta >>>>>>>>>>>>>>>> (időpont: 2025. márc. 14., P, 16:31): >>>>>>>>>>>>>>>> >> >>>>>> >>>>>>>>>>>>>>>> >> >>>>>> Hi Peter >>>>>>>>>>>>>>>> >> >>>>>> >>>>>>>>>>>>>>>> >> >>>>>> Thanks for the update. I will do a new pass on >>>>>>>>>>>>>>>> the PR. >>>>>>>>>>>>>>>> >> >>>>>> >>>>>>>>>>>>>>>> >> >>>>>> Regards >>>>>>>>>>>>>>>> >> >>>>>> JB >>>>>>>>>>>>>>>> >> >>>>>> >>>>>>>>>>>>>>>> >> >>>>>> On Thu, Mar 13, 2025 at 1:16 PM Péter Váry < >>>>>>>>>>>>>>>> peter.vary.apa...@gmail.com> wrote: >>>>>>>>>>>>>>>> >> >>>>>> > >>>>>>>>>>>>>>>> >> >>>>>> > Hi Team, >>>>>>>>>>>>>>>> >> >>>>>> > I have rebased the File Format API proposal ( >>>>>>>>>>>>>>>> https://github.com/apache/iceberg/pull/12298) to include >>>>>>>>>>>>>>>> the new changes needed for the Variant types. I would love to >>>>>>>>>>>>>>>> hear your >>>>>>>>>>>>>>>> feedback, especially Dan and Ryan, as you were the most active >>>>>>>>>>>>>>>> during our >>>>>>>>>>>>>>>> discussions. If I can help in any way to make the review >>>>>>>>>>>>>>>> easier, please let >>>>>>>>>>>>>>>> me know. >>>>>>>>>>>>>>>> >> >>>>>> > Thanks, >>>>>>>>>>>>>>>> >> >>>>>> > Peter >>>>>>>>>>>>>>>> >> >>>>>> > >>>>>>>>>>>>>>>> >> >>>>>> > Péter Váry <peter.vary.apa...@gmail.com> ezt >>>>>>>>>>>>>>>> írta (időpont: 2025. febr. 28., P, 17:50): >>>>>>>>>>>>>>>> >> >>>>>> >> >>>>>>>>>>>>>>>> >> >>>>>> >> Hi everyone, >>>>>>>>>>>>>>>> >> >>>>>> >> Thanks for all of the actionable, relevant >>>>>>>>>>>>>>>> feedback on the PR ( >>>>>>>>>>>>>>>> https://github.com/apache/iceberg/pull/12298). >>>>>>>>>>>>>>>> >> >>>>>> >> Updated the code to address most of them. >>>>>>>>>>>>>>>> Please check if you agree with the general approach. >>>>>>>>>>>>>>>> >> >>>>>> >> If there is a consensus about the general >>>>>>>>>>>>>>>> approach, I could. separate out the PR to smaller pieces so we >>>>>>>>>>>>>>>> can have an >>>>>>>>>>>>>>>> easier time to review and merge those step-by-step. >>>>>>>>>>>>>>>> >> >>>>>> >> Thanks, >>>>>>>>>>>>>>>> >> >>>>>> >> Peter >>>>>>>>>>>>>>>> >> >>>>>> >> >>>>>>>>>>>>>>>> >> >>>>>> >> Jean-Baptiste Onofré <j...@nanthrax.net> ezt >>>>>>>>>>>>>>>> írta (időpont: 2025. febr. 20., Cs, 14:14): >>>>>>>>>>>>>>>> >> >>>>>> >>> >>>>>>>>>>>>>>>> >> >>>>>> >>> Hi Peter >>>>>>>>>>>>>>>> >> >>>>>> >>> >>>>>>>>>>>>>>>> >> >>>>>> >>> sorry for the late reply on this. >>>>>>>>>>>>>>>> >> >>>>>> >>> >>>>>>>>>>>>>>>> >> >>>>>> >>> I did a pass on the proposal, it's very >>>>>>>>>>>>>>>> interesting and well written. >>>>>>>>>>>>>>>> >> >>>>>> >>> I like the DataFile API and definitely worth >>>>>>>>>>>>>>>> to discuss all together. >>>>>>>>>>>>>>>> >> >>>>>> >>> >>>>>>>>>>>>>>>> >> >>>>>> >>> Maybe we can schedule a specific meeting to >>>>>>>>>>>>>>>> discuss about DataFile API ? >>>>>>>>>>>>>>>> >> >>>>>> >>> >>>>>>>>>>>>>>>> >> >>>>>> >>> Thoughts ? >>>>>>>>>>>>>>>> >> >>>>>> >>> >>>>>>>>>>>>>>>> >> >>>>>> >>> Regards >>>>>>>>>>>>>>>> >> >>>>>> >>> JB >>>>>>>>>>>>>>>> >> >>>>>> >>> >>>>>>>>>>>>>>>> >> >>>>>> >>> On Tue, Feb 11, 2025 at 5:46 PM Péter Váry < >>>>>>>>>>>>>>>> peter.vary.apa...@gmail.com> wrote: >>>>>>>>>>>>>>>> >> >>>>>> >>> > >>>>>>>>>>>>>>>> >> >>>>>> >>> > Hi Team, >>>>>>>>>>>>>>>> >> >>>>>> >>> > >>>>>>>>>>>>>>>> >> >>>>>> >>> > As mentioned earlier on our Community Sync >>>>>>>>>>>>>>>> I am exploring the possibility to define a FileFormat API for >>>>>>>>>>>>>>>> accessing >>>>>>>>>>>>>>>> different file formats. I have put together a proposal based >>>>>>>>>>>>>>>> on my findings. >>>>>>>>>>>>>>>> >> >>>>>> >>> > >>>>>>>>>>>>>>>> >> >>>>>> >>> > ------------------- >>>>>>>>>>>>>>>> >> >>>>>> >>> > Iceberg currently supports 3 different file >>>>>>>>>>>>>>>> formats: Avro, Parquet, ORC. With the introduction of Iceberg >>>>>>>>>>>>>>>> V3 >>>>>>>>>>>>>>>> specification many new features are added to Iceberg. Some of >>>>>>>>>>>>>>>> these >>>>>>>>>>>>>>>> features like new column types, default values require changes >>>>>>>>>>>>>>>> at the file >>>>>>>>>>>>>>>> format level. The changes are added by individual developers >>>>>>>>>>>>>>>> with different >>>>>>>>>>>>>>>> focus on the different file formats. As a result not all of >>>>>>>>>>>>>>>> the features >>>>>>>>>>>>>>>> are available for every supported file format. >>>>>>>>>>>>>>>> >> >>>>>> >>> > Also there are emerging file formats like >>>>>>>>>>>>>>>> Vortex [1] or Lance [2] which either by specialization, or by >>>>>>>>>>>>>>>> applying >>>>>>>>>>>>>>>> newer research results could provide better alternatives for >>>>>>>>>>>>>>>> certain >>>>>>>>>>>>>>>> use-cases like random access for data, or storing ML models. >>>>>>>>>>>>>>>> >> >>>>>> >>> > ------------------- >>>>>>>>>>>>>>>> >> >>>>>> >>> > >>>>>>>>>>>>>>>> >> >>>>>> >>> > Please check the detailed proposal [3] and >>>>>>>>>>>>>>>> the google document [4], and comment there or reply on the dev >>>>>>>>>>>>>>>> list if you >>>>>>>>>>>>>>>> have any suggestions. >>>>>>>>>>>>>>>> >> >>>>>> >>> > >>>>>>>>>>>>>>>> >> >>>>>> >>> > Thanks, >>>>>>>>>>>>>>>> >> >>>>>> >>> > Peter >>>>>>>>>>>>>>>> >> >>>>>> >>> > >>>>>>>>>>>>>>>> >> >>>>>> >>> > [1] - https://github.com/spiraldb/vortex >>>>>>>>>>>>>>>> >> >>>>>> >>> > [2] - https://lancedb.github.io/lance/ >>>>>>>>>>>>>>>> >> >>>>>> >>> > [3] - >>>>>>>>>>>>>>>> https://github.com/apache/iceberg/issues/12225 >>>>>>>>>>>>>>>> >> >>>>>> >>> > [4] - >>>>>>>>>>>>>>>> https://docs.google.com/document/d/1sF_d4tFxJsZWsZFCyCL9ZE7YuI7-P3VrzMLIrrTIxds >>>>>>>>>>>>>>>> >> >>>>>> >>> > >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>