Re: [Discuss] Efficient column updates in Iceberg

Anurag Mantripragada Fri, 13 Feb 2026 11:27:00 -0800

Hi Micah,

This seems like a classic MoR vs CoW trade-off.  But it seems like maybe
> both sparse and full should be available (I understand this adds
> complexity). For adding a new column or completely updating a new column,
> the performance would be better to prefill the data



Our internal use cases are very similar to what you describe. We primarily
deal with full column updates. However, the feedback on the proposal from
the wider community indicated that partial updates (e.g., bug-fixing a
subset of rows, updating features for active users) are also a very common
and critical use case.

Is there evidence to say that partial column updates are more common in
> practice then full rewrites?


Personally, I don't have hard data on which use case is more common in the
wild, only that both appear to be important. I also agree that a good long
term solution should support both strategies. Given that, the sparse
representation with alignment at read time (using dummy/null values) seems
to provide the benefits of both efficient vectorized reads and stitching as
well as support for partial column updates. Would you agree?

~ Anurag

On Fri, Feb 13, 2026 at 9:33 AM Micah Kornfield <[email protected]>
wrote:

> Hi Anurag,
>
>> Data Representation: Sparse column files are preferred for compact
>> representation and are better suited for partial column updates. We can
>> optimize sparse representation for vectorized reads by filling in null
>> or default values at read time for missing positions from the base file,
>> which avoids joins during reads.
>
>
> This seems like a classic MoR vs CoW trade-off.  But it seems like maybe
> both sparse and full should be available (I understand this adds
> complexity).  For adding a new column or completely updating a new column,
> the performance would be better to prefill the data (otherwise one ends up
> duplicating the work that is already happening under the hood in parquet).
>
> Is there evidence to say that partial column updates are more common in
> practice then full rewrites?
>
> Thanks,
> Micah
>
>
> On Thu, Feb 12, 2026 at 3:32 AM Eduard Tudenhöfner <
> [email protected]> wrote:
>
>> Hey Anurag,
>>
>> I wasn't able to make it to the sync but was hoping to watch the
>> recording afterwards.
>> I'm curious what the reasons were for discarding the Parquet-native
>> approach. Could you share a summary from what was discussed in the sync
>> please on that topic?
>>
>> On Tue, Feb 10, 2026 at 8:20 PM Anurag Mantripragada <
>> [email protected]> wrote:
>>
>>> Hi all,
>>>
>>> Thank you for attending today's sync. Please find the meeting notes
>>> below. I apologize that we were unable to record the session due to
>>> attendees not having record access.
>>>
>>> Key updates and discussion points:
>>>
>>> *Decisions:*
>>>
>>>    - Table Format vs. Parquet: There is a general consensus that column
>>>    update support should reside in the table format. Consequently, we have
>>>    discarded the Parquet-native approach.
>>>    - Metadata Representation: To maintain clean metadata and avoid
>>>    complex resolution logic for readers, the goal is to keep only one 
>>> metadata
>>>    file per column. However, achieving this is challenging if we support
>>>    partial updates, as multiple column files may exist for the same column
>>>    (See open questions).
>>>    - Data Representation: Sparse column files are preferred for compact
>>>    representation and are better suited for partial column updates. We can
>>>    optimize sparse representation for vectorized reads by filling in null or
>>>    default values at read time for missing positions from the base file, 
>>> which
>>>    avoids joins during reads.
>>>
>>>
>>> *Open Questions: *
>>>
>>>    - We are still determining what restrictions are necessary when
>>>    supporting partial updates. For instance, we need to decide whether to 
>>> add
>>>    a new column and subsequently allow partial updates on it. This would
>>>    involve managing both a base column file and subsequent update files.
>>>    - We need a better understanding of the use cases for partial
>>>    updates.
>>>    - We need to further discuss the handling of equality deletes.
>>>
>>> If I missed anything, or if others took notes, please share them here.
>>> Thanks!
>>>
>>> I will go ahead and update the doc with what we have discussed so we can
>>> continue next time from where we left off.
>>>
>>> ~ Anurag
>>>
>>> On Mon, Feb 9, 2026 at 11:55 AM Anurag Mantripragada <
>>> [email protected]> wrote:
>>>
>>>> Hi all,
>>>>
>>>> This design
>>>> <https://docs.google.com/document/d/1Bd7JVzgajA8-DozzeEE24mID_GLuz6iwj0g4TlcVJcs/edit?tab=t.0>
>>>> will be discussed tomorrow in a dedicated sync.
>>>>
>>>> Efficient column updates sync
>>>> Tuesday, February 10 · 9:00 – 10:00am
>>>> Time zone: America/Los_Angeles
>>>> Google Meet joining info
>>>> Video call link: https://meet.google.com/xsd-exug-tcd
>>>>
>>>> ~ Anurag
>>>>
>>>> On Fri, Feb 6, 2026 at 8:30 AM Anurag Mantripragada <
>>>> [email protected]> wrote:
>>>>
>>>>> Hi Gabor,
>>>>>
>>>>> Thanks for the detailed example.
>>>>>
>>>>> I agree with Steven that Option 2 seems reasonable. I will add a
>>>>> section to the design doc regarding equality delete handling, and we can
>>>>> discuss this further during our meeting on Tuesday.
>>>>>
>>>>> ~Anurag
>>>>>
>>>>> On Fri, Feb 6, 2026 at 7:08 AM Steven Wu <[email protected]> wrote:
>>>>>
>>>>>> > 1) When deleting with eq-deletes: If there is a column update on
>>>>>> the equality-filed ID we use for the delete, reject deletion
>>>>>> > 2) When adding a column update on a column that is part of the
>>>>>> equality field IDs in some delete, we reject the column update
>>>>>>
>>>>>> Gabor, this is a good scenario. The 2nd option makes sense to me,
>>>>>> since equality ids are like primary key fields. If we have the 2nd rule
>>>>>> enforced, the first option is not applicable anymore.
>>>>>>
>>>>>> On Fri, Feb 6, 2026 at 3:13 AM Gábor Kaszab <[email protected]>
>>>>>> wrote:
>>>>>>
>>>>>>> Hey,
>>>>>>>
>>>>>>> Thank you for the proposal, Anurag! I made a pass recently and I
>>>>>>> think there is some interference between column updates and equality
>>>>>>> deletes. Let me describe below:
>>>>>>>
>>>>>>> Steps:
>>>>>>>
>>>>>>> CREATE TABLE tbl (int a, int b);
>>>>>>>
>>>>>>> INSERT INTO tbl VALUES (1, 11), (2, 22);  -- creates the base data
>>>>>>> file
>>>>>>>
>>>>>>> DELETE FROM tbl WHERE b=11;               -- creates an equality
>>>>>>> delete file
>>>>>>>
>>>>>>> UPDATE tbl SET b=11;                                   -- writes
>>>>>>> column update
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> SELECT * FROM tbl;
>>>>>>>
>>>>>>> Expected result:
>>>>>>>
>>>>>>> (2, 11)
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Data and metadata created after the above steps:
>>>>>>>
>>>>>>> Base file
>>>>>>>
>>>>>>> (1, 11), (2, 22),
>>>>>>>
>>>>>>> seqnum=1
>>>>>>>
>>>>>>> EQ-delete
>>>>>>>
>>>>>>> b=11
>>>>>>>
>>>>>>> seqnum=2
>>>>>>>
>>>>>>> Column update
>>>>>>>
>>>>>>> Field ids: [field_id_for_col_b]
>>>>>>>
>>>>>>> seqnum=3
>>>>>>>
>>>>>>> Data file content: (dummy_value),(11)
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Read steps:
>>>>>>>
>>>>>>>    1. Stitch base file with column updates in reader:
>>>>>>>
>>>>>>> Rows: (1,dummy_value), (2,11) (Note, dummy value can be either
>>>>>>> null, or 11, see the proposal for more details)
>>>>>>>
>>>>>>> Seqnum for base file=1
>>>>>>>
>>>>>>> Seqnum for column update=3
>>>>>>>
>>>>>>>    2. Apply eq-delete b=11, seqnum=3 on the stitched result
>>>>>>>    3. Query result depends on which seqnum we carry forward to
>>>>>>>    compare with the eq-delete's seqnum, but it's not correct in any of 
>>>>>>> the
>>>>>>>    cases
>>>>>>>       1. Use seqnum from base file: we get either an empty result
>>>>>>>       if 'dummy_value' is 11 or we get (1, null) otherwise
>>>>>>>       2. Use seqnum from last update file: don't delete any rows,
>>>>>>>       result set is (1, dummy_value),(2,11)
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Problem:
>>>>>>>
>>>>>>> EQ-delete should be applied midway applying the column updates to
>>>>>>> the base file based on sequence number, during the stitching process. If
>>>>>>> I'm not mistaken, this is not feasible with the way readers work.
>>>>>>>
>>>>>>>
>>>>>>> Proposal:
>>>>>>>
>>>>>>> Don't allow equality deletes together with column updates.
>>>>>>>
>>>>>>>   1) When deleting with eq-deletes: If there is a column update on
>>>>>>> the equality-filed ID we use for the delete, reject deletion
>>>>>>>
>>>>>>>   2) When adding a column update on a column that is part of the
>>>>>>> equality field IDs in some delete, we reject the column update
>>>>>>>
>>>>>>> Alternatively, column updates could be controlled by a property of
>>>>>>> the table (immutable), and reject eq-deletes if the property indicates
>>>>>>> column updates are turned on for the table
>>>>>>>
>>>>>>>
>>>>>>> Let me know what you think!
>>>>>>>
>>>>>>> Best Regards,
>>>>>>>
>>>>>>> Gabor
>>>>>>>
>>>>>>> Anurag Mantripragada <[email protected]> ezt írta (időpont:
>>>>>>> 2026. jan. 28., Sze, 3:31):
>>>>>>>
>>>>>>>> Thank you everyone for the initial review comments. It is exciting
>>>>>>>> to see so much interest in this proposal.
>>>>>>>>
>>>>>>>> I am currently reviewing and responding to each comment. The
>>>>>>>> general themes of the feedback so far include:
>>>>>>>> - Including partial updates (column updates on a subset of rows in
>>>>>>>> a table).
>>>>>>>> - Adding details on how SQL engines will write the update files.
>>>>>>>> - Adding details on split planning and row alignment for update
>>>>>>>> files.
>>>>>>>>
>>>>>>>> I will think through these points and update the design accordingly.
>>>>>>>>
>>>>>>>> Best
>>>>>>>> Anurag
>>>>>>>>
>>>>>>>> On Tue, Jan 27, 2026 at 6:25 PM Anurag Mantripragada <
>>>>>>>> [email protected]> wrote:
>>>>>>>>
>>>>>>>>> Hi Xiangin,
>>>>>>>>>
>>>>>>>>> Happy to learn from your experience in supporting
>>>>>>>>> backfill use-cases. Please feel free to review the proposal and add 
>>>>>>>>> your
>>>>>>>>> comments. I will wait for a couple of days more to ensure everyone 
>>>>>>>>> has a
>>>>>>>>> chance to review the proposal.
>>>>>>>>>
>>>>>>>>> ~ Anurag
>>>>>>>>>
>>>>>>>>> On Tue, Jan 27, 2026 at 6:42 AM Xianjin Ye <[email protected]>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Hi Anurag and Peter,
>>>>>>>>>>
>>>>>>>>>> It’s great to see the partial column update has gained great
>>>>>>>>>> interest in the community. I internally built a BackfillColumns 
>>>>>>>>>> action to
>>>>>>>>>> efficiently backfill columns(by writing the partial columns only and 
>>>>>>>>>> copies
>>>>>>>>>> the binary data of other columns into a new DataFile). The speedup 
>>>>>>>>>> could be
>>>>>>>>>> 10x for wide tables but the write amplification is still there. I 
>>>>>>>>>> would be
>>>>>>>>>> happy to collaborate on the work and eliminate the write 
>>>>>>>>>> amplification.
>>>>>>>>>>
>>>>>>>>>> On 2026/01/27 10:12:54 Péter Váry wrote:
>>>>>>>>>> > Hi Anurag,
>>>>>>>>>> >
>>>>>>>>>> > It’s great to see how much interest there is in the community
>>>>>>>>>> around this
>>>>>>>>>> > potential new feature. Gábor and I have actually submitted an
>>>>>>>>>> Iceberg
>>>>>>>>>> > Summit talk proposal on this topic, and we would be very happy
>>>>>>>>>> to
>>>>>>>>>> > collaborate on the work. I was mainly waiting for the File
>>>>>>>>>> Format API to be
>>>>>>>>>> > finalized, as I believe this feature should build on top of it.
>>>>>>>>>> >
>>>>>>>>>> > For reference, our related work includes:
>>>>>>>>>> >
>>>>>>>>>> >    - *Dev list thread:*
>>>>>>>>>> >
>>>>>>>>>> https://lists.apache.org/thread/h0941sdq9jwrb6sj0pjfjjxov8tx7ov9
>>>>>>>>>> >    - *Proposal document:*
>>>>>>>>>> >
>>>>>>>>>> https://docs.google.com/document/d/1OHuZ6RyzZvCOQ6UQoV84GzwVp3UPiu_cfXClsOi03ww
>>>>>>>>>> >    (not shared widely yet)
>>>>>>>>>> >    - *Performance testing PR for readers and writers:*
>>>>>>>>>> >    https://github.com/apache/iceberg/pull/13306
>>>>>>>>>> >
>>>>>>>>>> > During earlier discussions about possible metadata changes,
>>>>>>>>>> another option
>>>>>>>>>> > came up that hasn’t been documented yet: separating planner
>>>>>>>>>> metadata from
>>>>>>>>>> > reader metadata. Since the planner does not need to know about
>>>>>>>>>> the actual
>>>>>>>>>> > files, we could store the file composition in a separate file
>>>>>>>>>> (potentially
>>>>>>>>>> > a Puffin file). This file could hold the column_files metadata,
>>>>>>>>>> while the
>>>>>>>>>> > manifest would reference the Puffin file and blob position
>>>>>>>>>> instead of the
>>>>>>>>>> > data filename.
>>>>>>>>>> > This approach has the advantage of keeping the existing
>>>>>>>>>> metadata largely
>>>>>>>>>> > intact, and it could also give us a natural place later to add
>>>>>>>>>> file-level
>>>>>>>>>> > indexes or Bloom filters for use during reads or secondary
>>>>>>>>>> filtering. The
>>>>>>>>>> > downsides are the additional files and the increased complexity
>>>>>>>>>> of
>>>>>>>>>> > identifying files that are no longer referenced by the table,
>>>>>>>>>> so this may
>>>>>>>>>> > not be an ideal solution.
>>>>>>>>>> >
>>>>>>>>>> > I do have some concerns about the MoR metadata proposal
>>>>>>>>>> described in the
>>>>>>>>>> > document. At first glance, it seems to complicate distributed
>>>>>>>>>> planning, as
>>>>>>>>>> > all entries for a given file would need to be collected and
>>>>>>>>>> merged to
>>>>>>>>>> > provide the information required by both the planner and the
>>>>>>>>>> reader.
>>>>>>>>>> > Additionally, when a new column is added or updated, we would
>>>>>>>>>> still need to
>>>>>>>>>> > add a new metadata entry for every existing data file. If we
>>>>>>>>>> immediately
>>>>>>>>>> > write out the merged metadata, the total number of entries
>>>>>>>>>> remains the
>>>>>>>>>> > same. The main benefit is avoiding rewriting statistics, which
>>>>>>>>>> can be
>>>>>>>>>> > significant, but this comes at the cost of increased planning
>>>>>>>>>> complexity.
>>>>>>>>>> > If we choose to store the merged statistics in the
>>>>>>>>>> column_families entry, I
>>>>>>>>>> > don’t see much benefit in excluding the rest of the metadata,
>>>>>>>>>> especially
>>>>>>>>>> > since including it would simplify the planning process.
>>>>>>>>>> >
>>>>>>>>>> > As Anton already pointed out, we should also discuss how this
>>>>>>>>>> change would
>>>>>>>>>> > affect split handling, particularly how to avoid double reads
>>>>>>>>>> when row
>>>>>>>>>> > groups are not aligned between the original data files and the
>>>>>>>>>> new column
>>>>>>>>>> > files.
>>>>>>>>>> >
>>>>>>>>>> > Finally, I’d like to see some discussion around the Java API
>>>>>>>>>> implications.
>>>>>>>>>> > In particular, what API changes are required, and how SQL
>>>>>>>>>> engines would
>>>>>>>>>> > perform updates. Since the new column files must have the same
>>>>>>>>>> number of
>>>>>>>>>> > rows as the original data files, with a strict one-to-one
>>>>>>>>>> relationship, SQL
>>>>>>>>>> > engines would need access to the source filename, position, and
>>>>>>>>>> deletion
>>>>>>>>>> > status in the DataFrame in order to generate the new files.
>>>>>>>>>> This is more
>>>>>>>>>> > involved than a simple update and deserves some explicit
>>>>>>>>>> consideration.
>>>>>>>>>> >
>>>>>>>>>> > Looking forward to your thoughts.
>>>>>>>>>> > Best regards,
>>>>>>>>>> > Peter
>>>>>>>>>> >
>>>>>>>>>> > On Tue, Jan 27, 2026, 03:58 Anurag Mantripragada <
>>>>>>>>>> [email protected]>
>>>>>>>>>> > wrote:
>>>>>>>>>> >
>>>>>>>>>> > > Thanks Anton and others, for providing some initial feedback.
>>>>>>>>>> I will
>>>>>>>>>> > > address all your comments soon.
>>>>>>>>>> > >
>>>>>>>>>> > > On Mon, Jan 26, 2026 at 11:10 AM Anton Okolnychyi <
>>>>>>>>>> [email protected]>
>>>>>>>>>> > > wrote:
>>>>>>>>>> > >
>>>>>>>>>> > >> I had a chance to see the proposal before it landed and I
>>>>>>>>>> think it is a
>>>>>>>>>> > >> cool idea and both presented approaches would likely work. I
>>>>>>>>>> am looking
>>>>>>>>>> > >> forward to discussing the tradeoffs and would encourage
>>>>>>>>>> everyone to
>>>>>>>>>> > >> push/polish each approach to see what issues can be
>>>>>>>>>> mitigated and what are
>>>>>>>>>> > >> fundamental.
>>>>>>>>>> > >>
>>>>>>>>>> > >> [1] Iceberg-native approach: better visibility into column
>>>>>>>>>> files from the
>>>>>>>>>> > >> metadata, potentially better concurrency for non-overlapping
>>>>>>>>>> column
>>>>>>>>>> > >> updates, no dep on Parquet.
>>>>>>>>>> > >> [2] Parquet-native approach: almost no changes to the table
>>>>>>>>>> format
>>>>>>>>>> > >> metadata beyond tracking of base files.
>>>>>>>>>> > >>
>>>>>>>>>> > >> I think [1] sounds a bit better on paper but I am worried
>>>>>>>>>> about the
>>>>>>>>>> > >> complexity in writers and readers (especially around keeping
>>>>>>>>>> row groups
>>>>>>>>>> > >> aligned and split planning). It would be great to cover this
>>>>>>>>>> in detail in
>>>>>>>>>> > >> the proposal.
>>>>>>>>>> > >>
>>>>>>>>>> > >> пн, 26 січ. 2026 р. о 09:00 Anurag Mantripragada <
>>>>>>>>>> > >> [email protected]> пише:
>>>>>>>>>> > >>
>>>>>>>>>> > >>> Hi all,
>>>>>>>>>> > >>>
>>>>>>>>>> > >>> "Wide tables" with thousands of columns present significant
>>>>>>>>>> challenges
>>>>>>>>>> > >>> for AI/ML workloads, particularly when only a subset of
>>>>>>>>>> columns needs to be
>>>>>>>>>> > >>> added or updated. Current Copy-on-Write (COW) and
>>>>>>>>>> Merge-on-Read (MOR)
>>>>>>>>>> > >>> operations in Iceberg apply at the row level, which leads
>>>>>>>>>> to substantial
>>>>>>>>>> > >>> write amplification in scenarios such as:
>>>>>>>>>> > >>>
>>>>>>>>>> > >>>    - Feature Backfilling & Column Updates: Adding new
>>>>>>>>>> feature columns
>>>>>>>>>> > >>>    (e.g., model embeddings) to petabyte-scale tables.
>>>>>>>>>> > >>>    - Model Score Updates: Refresh prediction scores after
>>>>>>>>>> retraining.
>>>>>>>>>> > >>>    - Embedding Refresh: Updating vector embeddings, which
>>>>>>>>>> currently
>>>>>>>>>> > >>>    triggers a rewrite of the entire row.
>>>>>>>>>> > >>>    - Incremental Feature Computation: Daily updates to a
>>>>>>>>>> small fraction
>>>>>>>>>> > >>>    of features in wide tables.
>>>>>>>>>> > >>>
>>>>>>>>>> > >>> With the Iceberg V4 proposal introducing single-file
>>>>>>>>>> commits and column
>>>>>>>>>> > >>> stats improvements, this is an ideal time to address
>>>>>>>>>> column-level updates
>>>>>>>>>> > >>> to better support these use cases.
>>>>>>>>>> > >>>
>>>>>>>>>> > >>> I have drafted a proposal that explores both table-format
>>>>>>>>>> enhancements
>>>>>>>>>> > >>> and file-format (Parquet) changes to enable more efficient
>>>>>>>>>> updates.
>>>>>>>>>> > >>>
>>>>>>>>>> > >>> Proposal Details:
>>>>>>>>>> > >>> - GitHub Issue: #15146 <
>>>>>>>>>> https://github.com/apache/iceberg/issues/15146>
>>>>>>>>>> > >>> - Design Document: Efficient Column Updates in Iceberg
>>>>>>>>>> > >>> <
>>>>>>>>>> https://docs.google.com/document/d/1Bd7JVzgajA8-DozzeEE24mID_GLuz6iwj0g4TlcVJcs/edit?tab=t.0
>>>>>>>>>> >
>>>>>>>>>> > >>>
>>>>>>>>>> > >>> Next Steps:
>>>>>>>>>> > >>> I plan to create POCs to benchmark the approaches described
>>>>>>>>>> in the
>>>>>>>>>> > >>> document.
>>>>>>>>>> > >>>
>>>>>>>>>> > >>> Please review the proposal and share your feedback.
>>>>>>>>>> > >>>
>>>>>>>>>> > >>> Thanks,
>>>>>>>>>> > >>> Anurag
>>>>>>>>>> > >>>
>>>>>>>>>> > >>
>>>>>>>>>> >
>>>>>>>>>>
>>>>>>>>>

Re: [Discuss] Efficient column updates in Iceberg

Reply via email to