Re: [Discuss] Efficient column updates in Iceberg

Anurag Mantripragada Tue, 27 Jan 2026 18:22:26 -0800

Hi Peter,

Thanks for reviewing the proposal.


Regarding your concerns about the MoR metadata proposal, I believe there
may be a misunderstanding of the primary approach. In the document, I
actually discarded the MoR metadata proposal (see Approach 3) due to high
planning costs. My main proposal (Approach 1) utilizes CoW metadata, which
rewrites the entry metadata for existing entries. This aligns closely with
your suggestion.

I will add more detail on the SQL execution and split planning to the doc.

~ Anurag

On Tue, Jan 27, 2026 at 2:13 AM Péter Váry <[email protected]>
wrote:

> Hi Anurag,
>
> It’s great to see how much interest there is in the community around this
> potential new feature. Gábor and I have actually submitted an Iceberg
> Summit talk proposal on this topic, and we would be very happy to
> collaborate on the work. I was mainly waiting for the File Format API to be
> finalized, as I believe this feature should build on top of it.
>
> For reference, our related work includes:
>
>    - *Dev list thread:*
>    https://lists.apache.org/thread/h0941sdq9jwrb6sj0pjfjjxov8tx7ov9
>    - *Proposal document:*
>    
> https://docs.google.com/document/d/1OHuZ6RyzZvCOQ6UQoV84GzwVp3UPiu_cfXClsOi03ww
>    (not shared widely yet)
>    - *Performance testing PR for readers and writers:*
>    https://github.com/apache/iceberg/pull/13306
>
> During earlier discussions about possible metadata changes, another option
> came up that hasn’t been documented yet: separating planner metadata from
> reader metadata. Since the planner does not need to know about the actual
> files, we could store the file composition in a separate file (potentially
> a Puffin file). This file could hold the column_files metadata, while the
> manifest would reference the Puffin file and blob position instead of the
> data filename.
> This approach has the advantage of keeping the existing metadata largely
> intact, and it could also give us a natural place later to add file-level
> indexes or Bloom filters for use during reads or secondary filtering. The
> downsides are the additional files and the increased complexity of
> identifying files that are no longer referenced by the table, so this may
> not be an ideal solution.
>
> I do have some concerns about the MoR metadata proposal described in the
> document. At first glance, it seems to complicate distributed planning, as
> all entries for a given file would need to be collected and merged to
> provide the information required by both the planner and the reader.
> Additionally, when a new column is added or updated, we would still need to
> add a new metadata entry for every existing data file. If we immediately
> write out the merged metadata, the total number of entries remains the
> same. The main benefit is avoiding rewriting statistics, which can be
> significant, but this comes at the cost of increased planning complexity.
> If we choose to store the merged statistics in the column_families entry, I
> don’t see much benefit in excluding the rest of the metadata, especially
> since including it would simplify the planning process.
>
> As Anton already pointed out, we should also discuss how this change would
> affect split handling, particularly how to avoid double reads when row
> groups are not aligned between the original data files and the new column
> files.
>
> Finally, I’d like to see some discussion around the Java API implications.
> In particular, what API changes are required, and how SQL engines would
> perform updates. Since the new column files must have the same number of
> rows as the original data files, with a strict one-to-one relationship, SQL
> engines would need access to the source filename, position, and deletion
> status in the DataFrame in order to generate the new files. This is more
> involved than a simple update and deserves some explicit consideration.
>
> Looking forward to your thoughts.
> Best regards,
> Peter
>
> On Tue, Jan 27, 2026, 03:58 Anurag Mantripragada <[email protected]>
> wrote:
>
>> Thanks Anton and others, for providing some initial feedback. I will
>> address all your comments soon.
>>
>> On Mon, Jan 26, 2026 at 11:10 AM Anton Okolnychyi <[email protected]>
>> wrote:
>>
>>> I had a chance to see the proposal before it landed and I think it is a
>>> cool idea and both presented approaches would likely work. I am looking
>>> forward to discussing the tradeoffs and would encourage everyone to
>>> push/polish each approach to see what issues can be mitigated and what are
>>> fundamental.
>>>
>>> [1] Iceberg-native approach: better visibility into column files from
>>> the metadata, potentially better concurrency for non-overlapping column
>>> updates, no dep on Parquet.
>>> [2] Parquet-native approach: almost no changes to the table format
>>> metadata beyond tracking of base files.
>>>
>>> I think [1] sounds a bit better on paper but I am worried about the
>>> complexity in writers and readers (especially around keeping row groups
>>> aligned and split planning). It would be great to cover this in detail in
>>> the proposal.
>>>
>>> пн, 26 січ. 2026 р. о 09:00 Anurag Mantripragada <
>>> [email protected]> пише:
>>>
>>>> Hi all,
>>>>
>>>> "Wide tables" with thousands of columns present significant challenges
>>>> for AI/ML workloads, particularly when only a subset of columns needs to be
>>>> added or updated. Current Copy-on-Write (COW) and Merge-on-Read (MOR)
>>>> operations in Iceberg apply at the row level, which leads to substantial
>>>> write amplification in scenarios such as:
>>>>
>>>>    - Feature Backfilling & Column Updates: Adding new feature columns
>>>>    (e.g., model embeddings) to petabyte-scale tables.
>>>>    - Model Score Updates: Refresh prediction scores after retraining.
>>>>    - Embedding Refresh: Updating vector embeddings, which currently
>>>>    triggers a rewrite of the entire row.
>>>>    - Incremental Feature Computation: Daily updates to a small
>>>>    fraction of features in wide tables.
>>>>
>>>> With the Iceberg V4 proposal introducing single-file commits and column
>>>> stats improvements, this is an ideal time to address column-level updates
>>>> to better support these use cases.
>>>>
>>>> I have drafted a proposal that explores both table-format enhancements
>>>> and file-format (Parquet) changes to enable more efficient updates.
>>>>
>>>> Proposal Details:
>>>> - GitHub Issue: #15146 <https://github.com/apache/iceberg/issues/15146>
>>>> - Design Document: Efficient Column Updates in Iceberg
>>>> <https://docs.google.com/document/d/1Bd7JVzgajA8-DozzeEE24mID_GLuz6iwj0g4TlcVJcs/edit?tab=t.0>
>>>>
>>>> Next Steps:
>>>> I plan to create POCs to benchmark the approaches described in the
>>>> document.
>>>>
>>>> Please review the proposal and share your feedback.
>>>>
>>>> Thanks,
>>>> Anurag
>>>>
>>>

Re: [Discuss] Efficient column updates in Iceberg

Reply via email to