Hi all, This design <https://docs.google.com/document/d/1Bd7JVzgajA8-DozzeEE24mID_GLuz6iwj0g4TlcVJcs/edit?tab=t.0> will be discussed tomorrow in a dedicated sync.
Efficient column updates sync Tuesday, February 10 · 9:00 – 10:00am Time zone: America/Los_Angeles Google Meet joining info Video call link: https://meet.google.com/xsd-exug-tcd ~ Anurag On Fri, Feb 6, 2026 at 8:30 AM Anurag Mantripragada < [email protected]> wrote: > Hi Gabor, > > Thanks for the detailed example. > > I agree with Steven that Option 2 seems reasonable. I will add a section > to the design doc regarding equality delete handling, and we can discuss > this further during our meeting on Tuesday. > > ~Anurag > > On Fri, Feb 6, 2026 at 7:08 AM Steven Wu <[email protected]> wrote: > >> > 1) When deleting with eq-deletes: If there is a column update on the >> equality-filed ID we use for the delete, reject deletion >> > 2) When adding a column update on a column that is part of the >> equality field IDs in some delete, we reject the column update >> >> Gabor, this is a good scenario. The 2nd option makes sense to me, since >> equality ids are like primary key fields. If we have the 2nd rule enforced, >> the first option is not applicable anymore. >> >> On Fri, Feb 6, 2026 at 3:13 AM Gábor Kaszab <[email protected]> >> wrote: >> >>> Hey, >>> >>> Thank you for the proposal, Anurag! I made a pass recently and I think >>> there is some interference between column updates and equality deletes. Let >>> me describe below: >>> >>> Steps: >>> >>> CREATE TABLE tbl (int a, int b); >>> >>> INSERT INTO tbl VALUES (1, 11), (2, 22); -- creates the base data file >>> >>> DELETE FROM tbl WHERE b=11; -- creates an equality delete >>> file >>> >>> UPDATE tbl SET b=11; -- writes column >>> update >>> >>> >>> >>> SELECT * FROM tbl; >>> >>> Expected result: >>> >>> (2, 11) >>> >>> >>> >>> Data and metadata created after the above steps: >>> >>> Base file >>> >>> (1, 11), (2, 22), >>> >>> seqnum=1 >>> >>> EQ-delete >>> >>> b=11 >>> >>> seqnum=2 >>> >>> Column update >>> >>> Field ids: [field_id_for_col_b] >>> >>> seqnum=3 >>> >>> Data file content: (dummy_value),(11) >>> >>> >>> >>> Read steps: >>> >>> 1. Stitch base file with column updates in reader: >>> >>> Rows: (1,dummy_value), (2,11) (Note, dummy value can be either null, or >>> 11, see the proposal for more details) >>> >>> Seqnum for base file=1 >>> >>> Seqnum for column update=3 >>> >>> 2. Apply eq-delete b=11, seqnum=3 on the stitched result >>> 3. Query result depends on which seqnum we carry forward to compare >>> with the eq-delete's seqnum, but it's not correct in any of the cases >>> 1. Use seqnum from base file: we get either an empty result if >>> 'dummy_value' is 11 or we get (1, null) otherwise >>> 2. Use seqnum from last update file: don't delete any rows, >>> result set is (1, dummy_value),(2,11) >>> >>> >>> >>> Problem: >>> >>> EQ-delete should be applied midway applying the column updates to the >>> base file based on sequence number, during the stitching process. If I'm >>> not mistaken, this is not feasible with the way readers work. >>> >>> >>> Proposal: >>> >>> Don't allow equality deletes together with column updates. >>> >>> 1) When deleting with eq-deletes: If there is a column update on the >>> equality-filed ID we use for the delete, reject deletion >>> >>> 2) When adding a column update on a column that is part of the >>> equality field IDs in some delete, we reject the column update >>> >>> Alternatively, column updates could be controlled by a property of the >>> table (immutable), and reject eq-deletes if the property indicates column >>> updates are turned on for the table >>> >>> >>> Let me know what you think! >>> >>> Best Regards, >>> >>> Gabor >>> >>> Anurag Mantripragada <[email protected]> ezt írta (időpont: >>> 2026. jan. 28., Sze, 3:31): >>> >>>> Thank you everyone for the initial review comments. It is exciting to >>>> see so much interest in this proposal. >>>> >>>> I am currently reviewing and responding to each comment. The general >>>> themes of the feedback so far include: >>>> - Including partial updates (column updates on a subset of rows in a >>>> table). >>>> - Adding details on how SQL engines will write the update files. >>>> - Adding details on split planning and row alignment for update files. >>>> >>>> I will think through these points and update the design accordingly. >>>> >>>> Best >>>> Anurag >>>> >>>> On Tue, Jan 27, 2026 at 6:25 PM Anurag Mantripragada < >>>> [email protected]> wrote: >>>> >>>>> Hi Xiangin, >>>>> >>>>> Happy to learn from your experience in supporting backfill use-cases. >>>>> Please feel free to review the proposal and add your comments. I will wait >>>>> for a couple of days more to ensure everyone has a chance to review the >>>>> proposal. >>>>> >>>>> ~ Anurag >>>>> >>>>> On Tue, Jan 27, 2026 at 6:42 AM Xianjin Ye <[email protected]> wrote: >>>>> >>>>>> Hi Anurag and Peter, >>>>>> >>>>>> It’s great to see the partial column update has gained great interest >>>>>> in the community. I internally built a BackfillColumns action to >>>>>> efficiently backfill columns(by writing the partial columns only and >>>>>> copies >>>>>> the binary data of other columns into a new DataFile). The speedup could >>>>>> be >>>>>> 10x for wide tables but the write amplification is still there. I would >>>>>> be >>>>>> happy to collaborate on the work and eliminate the write amplification. >>>>>> >>>>>> On 2026/01/27 10:12:54 Péter Váry wrote: >>>>>> > Hi Anurag, >>>>>> > >>>>>> > It’s great to see how much interest there is in the community >>>>>> around this >>>>>> > potential new feature. Gábor and I have actually submitted an >>>>>> Iceberg >>>>>> > Summit talk proposal on this topic, and we would be very happy to >>>>>> > collaborate on the work. I was mainly waiting for the File Format >>>>>> API to be >>>>>> > finalized, as I believe this feature should build on top of it. >>>>>> > >>>>>> > For reference, our related work includes: >>>>>> > >>>>>> > - *Dev list thread:* >>>>>> > https://lists.apache.org/thread/h0941sdq9jwrb6sj0pjfjjxov8tx7ov9 >>>>>> > - *Proposal document:* >>>>>> > >>>>>> https://docs.google.com/document/d/1OHuZ6RyzZvCOQ6UQoV84GzwVp3UPiu_cfXClsOi03ww >>>>>> > (not shared widely yet) >>>>>> > - *Performance testing PR for readers and writers:* >>>>>> > https://github.com/apache/iceberg/pull/13306 >>>>>> > >>>>>> > During earlier discussions about possible metadata changes, another >>>>>> option >>>>>> > came up that hasn’t been documented yet: separating planner >>>>>> metadata from >>>>>> > reader metadata. Since the planner does not need to know about the >>>>>> actual >>>>>> > files, we could store the file composition in a separate file >>>>>> (potentially >>>>>> > a Puffin file). This file could hold the column_files metadata, >>>>>> while the >>>>>> > manifest would reference the Puffin file and blob position instead >>>>>> of the >>>>>> > data filename. >>>>>> > This approach has the advantage of keeping the existing metadata >>>>>> largely >>>>>> > intact, and it could also give us a natural place later to add >>>>>> file-level >>>>>> > indexes or Bloom filters for use during reads or secondary >>>>>> filtering. The >>>>>> > downsides are the additional files and the increased complexity of >>>>>> > identifying files that are no longer referenced by the table, so >>>>>> this may >>>>>> > not be an ideal solution. >>>>>> > >>>>>> > I do have some concerns about the MoR metadata proposal described >>>>>> in the >>>>>> > document. At first glance, it seems to complicate distributed >>>>>> planning, as >>>>>> > all entries for a given file would need to be collected and merged >>>>>> to >>>>>> > provide the information required by both the planner and the reader. >>>>>> > Additionally, when a new column is added or updated, we would still >>>>>> need to >>>>>> > add a new metadata entry for every existing data file. If we >>>>>> immediately >>>>>> > write out the merged metadata, the total number of entries remains >>>>>> the >>>>>> > same. The main benefit is avoiding rewriting statistics, which can >>>>>> be >>>>>> > significant, but this comes at the cost of increased planning >>>>>> complexity. >>>>>> > If we choose to store the merged statistics in the column_families >>>>>> entry, I >>>>>> > don’t see much benefit in excluding the rest of the metadata, >>>>>> especially >>>>>> > since including it would simplify the planning process. >>>>>> > >>>>>> > As Anton already pointed out, we should also discuss how this >>>>>> change would >>>>>> > affect split handling, particularly how to avoid double reads when >>>>>> row >>>>>> > groups are not aligned between the original data files and the new >>>>>> column >>>>>> > files. >>>>>> > >>>>>> > Finally, I’d like to see some discussion around the Java API >>>>>> implications. >>>>>> > In particular, what API changes are required, and how SQL engines >>>>>> would >>>>>> > perform updates. Since the new column files must have the same >>>>>> number of >>>>>> > rows as the original data files, with a strict one-to-one >>>>>> relationship, SQL >>>>>> > engines would need access to the source filename, position, and >>>>>> deletion >>>>>> > status in the DataFrame in order to generate the new files. This is >>>>>> more >>>>>> > involved than a simple update and deserves some explicit >>>>>> consideration. >>>>>> > >>>>>> > Looking forward to your thoughts. >>>>>> > Best regards, >>>>>> > Peter >>>>>> > >>>>>> > On Tue, Jan 27, 2026, 03:58 Anurag Mantripragada < >>>>>> [email protected]> >>>>>> > wrote: >>>>>> > >>>>>> > > Thanks Anton and others, for providing some initial feedback. I >>>>>> will >>>>>> > > address all your comments soon. >>>>>> > > >>>>>> > > On Mon, Jan 26, 2026 at 11:10 AM Anton Okolnychyi < >>>>>> [email protected]> >>>>>> > > wrote: >>>>>> > > >>>>>> > >> I had a chance to see the proposal before it landed and I think >>>>>> it is a >>>>>> > >> cool idea and both presented approaches would likely work. I am >>>>>> looking >>>>>> > >> forward to discussing the tradeoffs and would encourage everyone >>>>>> to >>>>>> > >> push/polish each approach to see what issues can be mitigated >>>>>> and what are >>>>>> > >> fundamental. >>>>>> > >> >>>>>> > >> [1] Iceberg-native approach: better visibility into column files >>>>>> from the >>>>>> > >> metadata, potentially better concurrency for non-overlapping >>>>>> column >>>>>> > >> updates, no dep on Parquet. >>>>>> > >> [2] Parquet-native approach: almost no changes to the table >>>>>> format >>>>>> > >> metadata beyond tracking of base files. >>>>>> > >> >>>>>> > >> I think [1] sounds a bit better on paper but I am worried about >>>>>> the >>>>>> > >> complexity in writers and readers (especially around keeping row >>>>>> groups >>>>>> > >> aligned and split planning). It would be great to cover this in >>>>>> detail in >>>>>> > >> the proposal. >>>>>> > >> >>>>>> > >> пн, 26 січ. 2026 р. о 09:00 Anurag Mantripragada < >>>>>> > >> [email protected]> пише: >>>>>> > >> >>>>>> > >>> Hi all, >>>>>> > >>> >>>>>> > >>> "Wide tables" with thousands of columns present significant >>>>>> challenges >>>>>> > >>> for AI/ML workloads, particularly when only a subset of columns >>>>>> needs to be >>>>>> > >>> added or updated. Current Copy-on-Write (COW) and Merge-on-Read >>>>>> (MOR) >>>>>> > >>> operations in Iceberg apply at the row level, which leads to >>>>>> substantial >>>>>> > >>> write amplification in scenarios such as: >>>>>> > >>> >>>>>> > >>> - Feature Backfilling & Column Updates: Adding new feature >>>>>> columns >>>>>> > >>> (e.g., model embeddings) to petabyte-scale tables. >>>>>> > >>> - Model Score Updates: Refresh prediction scores after >>>>>> retraining. >>>>>> > >>> - Embedding Refresh: Updating vector embeddings, which >>>>>> currently >>>>>> > >>> triggers a rewrite of the entire row. >>>>>> > >>> - Incremental Feature Computation: Daily updates to a small >>>>>> fraction >>>>>> > >>> of features in wide tables. >>>>>> > >>> >>>>>> > >>> With the Iceberg V4 proposal introducing single-file commits >>>>>> and column >>>>>> > >>> stats improvements, this is an ideal time to address >>>>>> column-level updates >>>>>> > >>> to better support these use cases. >>>>>> > >>> >>>>>> > >>> I have drafted a proposal that explores both table-format >>>>>> enhancements >>>>>> > >>> and file-format (Parquet) changes to enable more efficient >>>>>> updates. >>>>>> > >>> >>>>>> > >>> Proposal Details: >>>>>> > >>> - GitHub Issue: #15146 < >>>>>> https://github.com/apache/iceberg/issues/15146> >>>>>> > >>> - Design Document: Efficient Column Updates in Iceberg >>>>>> > >>> < >>>>>> https://docs.google.com/document/d/1Bd7JVzgajA8-DozzeEE24mID_GLuz6iwj0g4TlcVJcs/edit?tab=t.0 >>>>>> > >>>>>> > >>> >>>>>> > >>> Next Steps: >>>>>> > >>> I plan to create POCs to benchmark the approaches described in >>>>>> the >>>>>> > >>> document. >>>>>> > >>> >>>>>> > >>> Please review the proposal and share your feedback. >>>>>> > >>> >>>>>> > >>> Thanks, >>>>>> > >>> Anurag >>>>>> > >>> >>>>>> > >> >>>>>> > >>>>>> >>>>>
