Re: [Discuss] Efficient column updates in Iceberg

Péter Váry Thu, 05 Mar 2026 05:57:23 -0800

Hi everyone,

I had to drop off after the first half hour, but I watched the recording
afterward and discussed the topic in depth with Gábor.


*TL;DR*

   1. My intuition is that the volume of updated data is usually small
   compared to the original file. Even for a table with, say, 1000 columns,
   updating only a few columns typically produces relatively little data, even
   if those columns are rewritten in full. As a result, overall cost is often
   dominated more by file access and seek overhead than by the actual amount
   of data read or written. This suggests we should favor a simpler solution
   and support only full column updates.
   2. Predicate pushdown does not work well with partial updates, whereas
   with some effort it can be made to work with full column updates.
   3. If we do want to support partial updates, I agree that sparse update
   files make sense. However, if we decide not to support partial updates, I
   think we should revisit the decision to use a custom encoding for update
   files. In that case, update files will typically contain very few deleted
   rows, which invalidates several assumptions behind sparse encodings. In
   this scenario, we could relatively cheaply add the `_file`, `_pos`, and
   `_deleted` columns to the read query, use that information to write out the
   results, and delegate the encoding to Parquet. Parquet already provides
   efficient encodings for columns that are not extremely sparse, and it would
   be difficult to outperform that with a custom solution.

*In detail*

What we gain compared to full column updates is mostly on the write path

   - We don’t need to read unchanged column values.
   - We may not need to touch the original data file at all.
   - We don’t need to write unchanged values (although we still need to
   create a new file).


What we lose is mostly on the read path

   - We need to read the column from the original file.
   - We also need to read the update file (even if it’s small, it’s still
   an additional file access).
   - Predicate pushdown does not work on the updated column; filters must
   be applied manually. Predicate pushdown can only be applied to the update
   file itself.


Edge cases

   - Partial updates shine when updates do not require reading from the
   original table at all.
   - Full updates are best when reads only need to touch the newly written
   data and can completely ignore the original file.


Typical case comparison
In practice, both approaches look quite similar in terms of file access:

   - Reads: original file + new data file in both cases.
   - Writes: read the original file (and any existing update file) and
   write new data in both cases.

The main differences are:

   - With partial column updates, we read and write less data during
   updates (only the changed cells).
   - With full column updates,
      - Reads are cheaper because data is already merged into a single file
      and we don’t need to read old column data.
      - Predicate pushdown can work, although we still need to combine with
      columns from the base file.


Overall, the key difference is the amount of column data read and written,
not full file sizes. At that point, file access patterns and seek overhead
tend to dominate the cost rather than raw I/O volume.

Anurag Mantripragada <[email protected]> ezt írta (időpont:
2026. márc. 5., Cs, 3:35):

> Hi everyone!
>
> Thanks for joining the sync today. Sorry, Google cut us off while Gabor
> was explaining his POC work. We can discuss that in the next meeting. Here
> is the recording <https://youtu.be/3_CuwfZV8oQ>.
>
> *Meeting notes:*
>
> *Partial updates *
>
>    - We could potentially support partial updates if the writer could
>    merge all the existing updates COW style into a new column file. We could
>    potentially explore this, but the general consensus was to favor a single
>    mechanism for updates, whether partial or not. This requires some more
>    thought and we can iterate over it.
>    - This remains an open question until we consider all the synchronous
>    writing cases.
>
> *Column File Row Alignment*
>
>    - We generally agreed on using sparse Parquet files to store updates. Each
>    update file contains only the modified values and their corresponding row
>    positions from the base file.
>    - *Rationale:* This avoids the stats-corruption risk of full, padded
>    files (which would require filling non-updated rows with arbitrary values)
>    and the Parquet limitation against top-level nulls.
>    - *Read Path:* Readers will materialize the sparse updates into a full
>    buffer with nulls, then efficiently merge by position.
>
> Single Update File Per Column
>
>    - To simplify reads, each base file can have only one active update
>    file per column.
>    - Subsequent updates must rewrite the existing update file,
>    synchronously applying all prior changes.
>    - This avoids the complexity of merging multiple update files during
>    the read path.
>
> *Other open questions*
>
>    - *Change Detection: *We need to think more about how change detection
>    would work with the synchronous update case. The V4 spec is undergoing
>    revisions to support other use cases, and we should follow that work to see
>    how this design aligns with it.
>
> *Next Steps*
>
>    - *Anurag:* Update the design doc with the sparse file format, the
>    single-update-file rule and add details about how this would work in
>    various scenarios.
>    - *Anurag:* Review the V4 CDC metadata proposal to ensure alignment
>    with the column update design.
>    - *Gábor:* Continue developing the POC, focusing on the synchronous
>    rewrite logic and reader implementation. *Anurag* will work on the
>    Spark plumbing needed to materialize only the changed rows and the planner
>    changes in Spark 4.x
>    - *All:* Schedule a follow-up meeting to review the updated design doc.
>
> Thanks,
> Anurag
>
>
> On Wed, Mar 4, 2026 at 1:47 PM Anton Okolnychyi <[email protected]>
> wrote:
>
>> Gabor, I know Anurag also expressed interest in extending Spark DML to
>> accommodate column updates. I am happy to work with both of you to get the
>> Spark piece designed and implemented. It is not something we would be able
>> to handle in Iceberg via extensions.
>>
>> Regarding partial updates, I agree we will have to iterate on open
>> questions before making a call on whether to support this functionality.
>> Can you elaborate on the last use case you mentioned? Why would we have to
>> combine {1} with {2, 3}? Will it be possible to produce a column file with
>> only affected columns in each write?
>>
>> ср, 4 бер. 2026 р. о 12:13 Gábor Kaszab <[email protected]> пише:
>>
>>> Hey All,
>>>
>>> Apparently, the meeting dropped all of us after exactly one hour :) At
>>> the end I just wanted to mention that during my attempt to implement a PoC
>>> I found a couple of missing building blocks (collecting the updated field
>>> IDs when committing after a Spark write; tweaking UPDATE's plan e.g.
>>> adding/removing columns compared to CoW) and also found some interesting
>>> technical details/questions (e.g. how to align rows when reading a split
>>> based on base file's split_offsets) that we could discuss next time. I'll
>>> collect all of these and share.
>>>
>>> In the meantime, I gave another thought to the *partial updates* idea
>>> Anton mentioned where we can basically have the same metadata and read path
>>> as for the full column update approach, and we'd push the responsibility to
>>> the writers to always merge existing updates with new ones. I think in
>>> theory, this seems a reasonable design and seems not that complicated to
>>> implement when the new update aligns with the field IDs of some of the
>>> existing updates. For instance, partially updating rows by field ID1 and
>>> then updating different rows also for the same field ID seems
>>> straightforward to merge these into a new file and refer that file from the
>>> metadata.
>>> However, I'm not sure how trivial it is when we update overlapping but
>>> not entirely the same set of fields. E.g first partially updating by fields
>>> {1, 2} then by {2, 3}. I don't think we want to merge these into 1 and have
>>> a single update for {1, 2, 3} as that would have a snowball effect of
>>> merging more and more cols together by time. But I don't think we want to
>>> split them either, or require a separate partial update for each field
>>> (wouldn't be suitable for column families either later on).
>>>
>>> Cheers,
>>> Gabor
>>>
>>> Micah Kornfield <[email protected]> ezt írta (időpont: 2026. márc.
>>> 4., Sze, 0:32):
>>>
>>>> If this is correct, it aligns well with the current proposal and
>>>>> shouldn't introduce any additional complexity. I will add it to the
>>>>> discussion points for tomorrow's community sync.
>>>>
>>>>
>>>> Yes, this example aligns with what I was thinking (nit: "range"
>>>> probably wouldn't be a string but I assume this was just for illustrative
>>>> purposes)
>>>>
>>>> On the other hand, in the column family use case, splitting columns is
>>>>> a strict requirement for performance. I haven’t considered how this would
>>>>> work, but perhaps we could introduce a table property for column families
>>>>> to make this explicit, and compaction jobs would have to respect
>>>>
>>>>
>>>> Yeah, I don't want to get into the exact mechanics for column families.
>>>> I was just calling out that compaction to the base file is not desirable in
>>>> all cases, so shouldn't be assumed as a solution for small files.
>>>>
>>>> Thanks,
>>>> Micah
>>>>
>>>>
>>>>
>>>> On Tue, Mar 3, 2026 at 3:11 PM Anurag Mantripragada <
>>>> [email protected]> wrote:
>>>>
>>>>> Hi Micah,
>>>>>
>>>>> Could you expand on the complexity you think this introduces (or more
>>>>>> specifically "significant" part)?
>>>>>
>>>>> I may have misunderstood your approach regarding packing row ranges.
>>>>> To clarify, is the following what you had in mind?
>>>>>
>>>>> Initially, we have base_file_1.parquet (rows 1-1000) and
>>>>> base_file_2.parquet (rows 1001-2000). If we update the "score" column
>>>>> across both files and pack those updates into a single larger file,
>>>>> packed_col_A.parquet, would the metadata structure look like this?
>>>>>
>>>>>   {    "data_file_path": "base_file_1.parquet",    "column_updates": [    
>>>>>   {        "field_id": 12,        "update_file_path": 
>>>>> "packed_col_A.parquet",        "row_range": "0-1000"      }    ]  },  {   
>>>>>  "data_file_path": "base_file_2.parquet",    "column_updates": [     {    
>>>>>     "field_id": 12,        "update_file_path": "packed_col_A.parquet",    
>>>>>     "row_range": "1001-2000"    }    ]  }
>>>>>
>>>>>
>>>>> If this is correct, it aligns well with the current proposal and
>>>>> shouldn't introduce any additional complexity. I will add it to the
>>>>> discussion points for tomorrow's community sync.
>>>>>
>>>>>
>>>>> This seems at odds with supporting column families in the future?
>>>>>
>>>>> In my opinion, there’s a distinction between the use cases of column
>>>>> updates and column families. Column updates are designed for fast writes
>>>>> while maintaining reasonable read performance. Compaction is desirable to
>>>>> reduce the complexity of the read side, if any. On the other hand, in the
>>>>> column family use case, splitting columns is a strict requirement for
>>>>> performance. I haven’t considered how this would work, but perhaps we 
>>>>> could
>>>>> introduce a table property for column families to make this explicit, and
>>>>> compaction jobs would have to respect it.
>>>>>
>>>>> ~Anurag
>>>>>
>>>>> On Tue, Mar 3, 2026 at 12:02 PM Micah Kornfield <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> Hi Anurag,
>>>>>>
>>>>>>> *Compaction and small files*: If I understand the row ranges idea
>>>>>>> correctly, packing multiple updates into larger column files would 
>>>>>>> require
>>>>>>> matching ranges to base files based on predicates, which adds 
>>>>>>> significant
>>>>>>> planning complexity. Regular compaction, which rewrites column files 
>>>>>>> into
>>>>>>> the base file seems more practical.
>>>>>>
>>>>>>
>>>>>> Could you expand on the complexity you think this introduces (or more
>>>>>> specifically "significant" part)? In this case the predicate should be
>>>>>> pretty simple (i.e. read rows between X and Y only) and can be done
>>>>>> efficiently via row group statistics.  Smart writers could even partition
>>>>>> rows for a specific base file into their own row group/pages to make the
>>>>>> filter trivial.
>>>>>>
>>>>>> Regular compaction, which rewrites column files into the base file
>>>>>>> seems more practical.
>>>>>>
>>>>>>
>>>>>> This seems at odds with supporting column families in the future?
>>>>>>
>>>>>> Thanks,
>>>>>> Micah
>>>>>>
>>>>>>
>>>>>> On Tue, Mar 3, 2026 at 11:43 AM Anurag Mantripragada <
>>>>>> [email protected]> wrote:
>>>>>>
>>>>>>> Hi all,
>>>>>>>
>>>>>>> Sorry for the delayed response. I was on vacation and catching up.
>>>>>>> Thanks for the continued discussion on this topic.
>>>>>>>
>>>>>>> *Partial updates*: I agree that MoR-style row-level updates offer
>>>>>>> limited benefits beyond reducing the writing of irrelevant columns. For 
>>>>>>> use
>>>>>>> cases like updating a subset of users, existing deletion vectors and the
>>>>>>> new V4 manifest delete vectors should perform well. Gabor’s suggestion 
>>>>>>> for
>>>>>>> file-level partial updates is a reasonable alternative, even with some
>>>>>>> write amplification.
>>>>>>>
>>>>>>> *Compaction and small files*: If I understand the row ranges idea
>>>>>>> correctly, packing multiple updates into larger column files would 
>>>>>>> require
>>>>>>> matching ranges to base files based on predicates, which adds 
>>>>>>> significant
>>>>>>> planning complexity. Regular compaction, which rewrites column files 
>>>>>>> into
>>>>>>> the base file seems more practical.
>>>>>>>
>>>>>>> *Column families*: While splitting columns into families is useful,
>>>>>>> the current design is more generic and already supports packing families
>>>>>>> into column files. Deciding how to group these columns (manually or via 
>>>>>>> an
>>>>>>> engine) can be addressed in separate follow-up work.
>>>>>>>
>>>>>>> *Next steps:*
>>>>>>>
>>>>>>>    - Gabor and I are developing a POC for metadata changes,
>>>>>>>    focusing on reading and writing column files using Spark for 
>>>>>>> integration.
>>>>>>>    We will share more details soon.
>>>>>>>    - I will update the doc in preparation for tomorrow's sync.
>>>>>>>
>>>>>>>
>>>>>>> As a reminder we have a sync on column updates upcoming
>>>>>>>
>>>>>>> Efficient column updates sync
>>>>>>> Wednesday, March 4 · 9:00 – 10:00am
>>>>>>> Time zone: America/Los_Angeles
>>>>>>> Google Meet joining info
>>>>>>> Video call link: https://meet.google.com/naf-tvvn-qup
>>>>>>>
>>>>>>> ~ Anurag
>>>>>>>
>>>>>>> On Wed, Feb 25, 2026 at 1:32 PM Gábor Kaszab <[email protected]>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hey All,
>>>>>>>>
>>>>>>>> Nice to see the activity on this thread. Thanks to everyone who
>>>>>>>> chimed in!
>>>>>>>>
>>>>>>>> Micah, I also feel that 1) (full column updates) and 2) (partial
>>>>>>>> but file-level column updates) could be a good middle ground between 
>>>>>>>> perf
>>>>>>>> improvement and keeping the code complexity low. In fact I had the 
>>>>>>>> chance
>>>>>>>> to experiment in this area and the metadata + API part would be as 
>>>>>>>> simple
>>>>>>>> as in this PoC <https://github.com/apache/iceberg/pull/15445>.
>>>>>>>> Just a side note for 3), from the SQL aspect I'm a bit hesitant how
>>>>>>>> straightforward it is for the users to write predicates that align with
>>>>>>>> file boundaries, though.
>>>>>>>> For deciding on partial column updates, we probably can't get away
>>>>>>>> without doing some measurements of how it compares to existing MoR. I 
>>>>>>>> have
>>>>>>>> it on my roadmap, so I'll share it once I have something.
>>>>>>>>
>>>>>>>> Wrapping multiple update files into one is an interesting idea.
>>>>>>>> Let's bring this up on the next sync! Additionally, full column updates
>>>>>>>> could add a huge overhead on the metadata files being created too 
>>>>>>>> (delete
>>>>>>>> everything + write everything with updates), unless we decide to do 
>>>>>>>> some
>>>>>>>> manifest rewrites/optimizations under the hood during the commit.
>>>>>>>>
>>>>>>>> Peter, column families as a schema-like table metadata level
>>>>>>>> information would definitely be useful. It seems like a natural 
>>>>>>>> follow-up
>>>>>>>> of the column update work, but we have to keep in mind to choose a 
>>>>>>>> design
>>>>>>>> that won't prevent us from implementing a more general column families
>>>>>>>> concept (probably for inserts too).
>>>>>>>>
>>>>>>>> Best Regards,
>>>>>>>> Gabor
>>>>>>>>
>>>>>>>> Micah Kornfield <[email protected]> ezt írta (időpont: 2026.
>>>>>>>> febr. 21., Szo, 17:53):
>>>>>>>>
>>>>>>>>> 1) and 3) are what I was thinking of as use-cases.  I agree unless
>>>>>>>>> there is a strong motivating use-case for MoR style column updates we
>>>>>>>>> should try to avoid this complexity and use the existing row based 
>>>>>>>>> MoR.
>>>>>>>>>
>>>>>>>>> One other idea I was trying to think through is the "small file
>>>>>>>>> problem" we would likely encounter for single column 
>>>>>>>>> additions/updates for
>>>>>>>>> fixed width data.  Would it make sense to add a record-range into the
>>>>>>>>> metadata for column families, so that we can pack column updates 
>>>>>>>>> across
>>>>>>>>> files into reasonably sized files (similar to what we do for DVs 
>>>>>>>>> today in
>>>>>>>>> puffin files)?
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Micah
>>>>>>>>>
>>>>>>>>> On Mon, Feb 16, 2026 at 7:23 AM Gábor Kaszab <
>>>>>>>>> [email protected]> wrote:
>>>>>>>>>
>>>>>>>>>> Hey All,
>>>>>>>>>>
>>>>>>>>>> Thanks Anurag for the summary!
>>>>>>>>>>
>>>>>>>>>> I regret we don't have a recording for the sync, but I had the
>>>>>>>>>> impression that, even though there was a lengthy discussion about the
>>>>>>>>>> implementation requirements for partial updates, there wasn't a 
>>>>>>>>>> strong
>>>>>>>>>> consensus around the need and there were no strong use cases to 
>>>>>>>>>> justify
>>>>>>>>>> partial updates either. Let me sum up where I see we are at now:
>>>>>>>>>>
>>>>>>>>>> *Scope of the updates*
>>>>>>>>>>
>>>>>>>>>> *1) Full column updates*
>>>>>>>>>> There is a consensus and common understanding that this use case
>>>>>>>>>> makes sense. If this was the only supported use-case, the 
>>>>>>>>>> implementation
>>>>>>>>>> would be relatively simple. We could guarantee there is no overlap in
>>>>>>>>>> column updates by deduplicating the field IDs in the column update
>>>>>>>>>> metadata. E.g. Let's say we have a column update on columns {1,2} 
>>>>>>>>>> and we
>>>>>>>>>> write another column update for {2,3}: we can change the metadata 
>>>>>>>>>> for the
>>>>>>>>>> first one to only cover {1} and not {1,2}. With this the write and 
>>>>>>>>>> the
>>>>>>>>>> read/stitching process is also straightforward (if we decide not to 
>>>>>>>>>> support
>>>>>>>>>> equality deletes together with column updates).
>>>>>>>>>>
>>>>>>>>>> Both row matching approaches could work here:
>>>>>>>>>>     - row number matching update files, where we fill the deleted
>>>>>>>>>> rows with an arbitrary value (preferably null)
>>>>>>>>>>     - sparse update files with some auxiliary column written into
>>>>>>>>>> the column update file, like row position in base file
>>>>>>>>>>
>>>>>>>>>> *2) Partial column updates (row-level)*
>>>>>>>>>> I see 2 use cases mentioned for this: bug-fixing a subset of
>>>>>>>>>> rows, updating features for active users
>>>>>>>>>> My initial impression here is that whether to use column updates
>>>>>>>>>> or not heavily depends on the selectivity of the partial update 
>>>>>>>>>> queries.
>>>>>>>>>> I'm sure there is a percentage of the affected rows where if we go 
>>>>>>>>>> below
>>>>>>>>>> it's simply better to use the traditional row level updates 
>>>>>>>>>> (cow/mor). I'm
>>>>>>>>>> not entirely convinced that covering these scenarios is worth the 
>>>>>>>>>> extra
>>>>>>>>>> complexity here:
>>>>>>>>>>     - We can't deduplicate the column updates by field IDs on the
>>>>>>>>>> metadata-side
>>>>>>>>>>     - We have two options for writers:
>>>>>>>>>>          - Merge the existing column update files themselves when
>>>>>>>>>> writing a new one with an overlap of field Ids. No need to sort out 
>>>>>>>>>> the
>>>>>>>>>> different column updates files and merge them on the read side, but 
>>>>>>>>>> there
>>>>>>>>>> is overhead on write side
>>>>>>>>>>         - Don't bother merging existing column updates when
>>>>>>>>>> writing a new one. This makes overhead on the read side.
>>>>>>>>>>
>>>>>>>>>> Handling of sparse update files is a must here, with the chance
>>>>>>>>>> for optimisation if all the rows are covered with the update file, 
>>>>>>>>>> as Micah
>>>>>>>>>> suggested.
>>>>>>>>>>
>>>>>>>>>> To sum up, I think to justify this approach we need to have
>>>>>>>>>> strong use-cases and measurements to verify that the extra complexity
>>>>>>>>>> results convincingly better results compared to existing CoW/MoR
>>>>>>>>>> approaches.
>>>>>>>>>>
>>>>>>>>>> *3) Partial column updates (file-level)*
>>>>>>>>>> This option wasn't brought up during our conversation but might
>>>>>>>>>> be worth considering. This is basically a middleground between the 
>>>>>>>>>> above
>>>>>>>>>> two approaches. Partial updates are allowed as long as they affect 
>>>>>>>>>> entire
>>>>>>>>>> data files, and it's allowed to only cover a subset of the files. One
>>>>>>>>>> use-case would be to do column updates per partition for instance.
>>>>>>>>>>
>>>>>>>>>> With this approach the metadata representation could be as simple
>>>>>>>>>> as in 1), where we can deduplicate the updates files by field IDs. 
>>>>>>>>>> Also
>>>>>>>>>> there is no write and read overhead on top of 1) apart from the
>>>>>>>>>> verification step to ensure that the WHERE filter on the update is 
>>>>>>>>>> doing
>>>>>>>>>> the split on file boundaries.
>>>>>>>>>> Also similarly to 1), sparse update files weren't a must here, we
>>>>>>>>>> could consider row-matching update files too.
>>>>>>>>>>
>>>>>>>>>> *Row alignment*
>>>>>>>>>> Sparse update files are required for row-level partial updates,
>>>>>>>>>> but if we decide to go with any of the other options we could also 
>>>>>>>>>> evaluate
>>>>>>>>>> the "row count matching" approach too. Even though it requires 
>>>>>>>>>> filling the
>>>>>>>>>> missing rows with arbitrary values (null seems a good candidate) it 
>>>>>>>>>> would
>>>>>>>>>> result in less write overhead (no need to write row position) and 
>>>>>>>>>> read
>>>>>>>>>> overhead (no need to join rows by row position) too that could worth 
>>>>>>>>>> the
>>>>>>>>>> inconvenience of having 'invalid' but inaccessible values in the 
>>>>>>>>>> files. The
>>>>>>>>>> num nulls stats being off is a good argument against this, but I 
>>>>>>>>>> think we
>>>>>>>>>> could have a way of fixing this too by keeping track of how many 
>>>>>>>>>> rows were
>>>>>>>>>> deleted (and subtract this value from the num nulls counter returned 
>>>>>>>>>> by the
>>>>>>>>>> writer).
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> *Next steps*
>>>>>>>>>> I'm actively working on a very basic PoC implementation where we
>>>>>>>>>> would be able to test the different approaches comparing pros and 
>>>>>>>>>> cons so
>>>>>>>>>> that we can make a decision on the above questions. I'll sync with 
>>>>>>>>>> Anurag
>>>>>>>>>> on this and will let you know once we have something.
>>>>>>>>>>
>>>>>>>>>> Best Regards,
>>>>>>>>>> Gabor
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Micah Kornfield <[email protected]> ezt írta (időpont: 2026.
>>>>>>>>>> febr. 14., Szo, 2:20):
>>>>>>>>>>
>>>>>>>>>>> Given that, the sparse representation with alignment at read
>>>>>>>>>>>> time (using dummy/null values) seems to provide the benefits of 
>>>>>>>>>>>> both
>>>>>>>>>>>> efficient vectorized reads and stitching as well as support for 
>>>>>>>>>>>> partial
>>>>>>>>>>>> column updates. Would you agree?
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Thinking more about it, I think the sparse approach is actually
>>>>>>>>>>> a superset set approach, so it is not a concern.  If writers want 
>>>>>>>>>>> they can
>>>>>>>>>>> write out the fully populated columns with position indexes from 1 
>>>>>>>>>>> to N,
>>>>>>>>>>> and readers can take an optimized path if they detect the number of 
>>>>>>>>>>> rows in
>>>>>>>>>>> the update is equal to the number of base rows.
>>>>>>>>>>>
>>>>>>>>>>> I still think there is a question on what writers should do
>>>>>>>>>>> (i.e. when do they decide to duplicate data instead of trying to 
>>>>>>>>>>> give
>>>>>>>>>>> sparse updates) but that is an implementation question and not 
>>>>>>>>>>> necessarily
>>>>>>>>>>> something that needs to block spec work.
>>>>>>>>>>>
>>>>>>>>>>> Cheers,
>>>>>>>>>>> Micah
>>>>>>>>>>>
>>>>>>>>>>> On Fri, Feb 13, 2026 at 11:29 AM Anurag Mantripragada <
>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hi Micah,
>>>>>>>>>>>>
>>>>>>>>>>>> This seems like a classic MoR vs CoW trade-off.  But it seems
>>>>>>>>>>>>> like maybe both sparse and full should be available (I understand 
>>>>>>>>>>>>> this adds
>>>>>>>>>>>>> complexity). For adding a new column or completely updating a new 
>>>>>>>>>>>>> column,
>>>>>>>>>>>>> the performance would be better to prefill the data
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Our internal use cases are very similar to what you describe.
>>>>>>>>>>>> We primarily deal with full column updates. However, the feedback 
>>>>>>>>>>>> on the
>>>>>>>>>>>> proposal from the wider community indicated that partial updates 
>>>>>>>>>>>> (e.g.,
>>>>>>>>>>>> bug-fixing a subset of rows, updating features for active users) 
>>>>>>>>>>>> are also a
>>>>>>>>>>>> very common and critical use case.
>>>>>>>>>>>>
>>>>>>>>>>>> Is there evidence to say that partial column updates are more
>>>>>>>>>>>>> common in practice then full rewrites?
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Personally, I don't have hard data on which use case is more
>>>>>>>>>>>> common in the wild, only that both appear to be important. I also 
>>>>>>>>>>>> agree
>>>>>>>>>>>> that a good long term solution should support both strategies. 
>>>>>>>>>>>> Given that,
>>>>>>>>>>>> the sparse representation with alignment at read time (using 
>>>>>>>>>>>> dummy/null
>>>>>>>>>>>> values) seems to provide the benefits of both efficient vectorized 
>>>>>>>>>>>> reads
>>>>>>>>>>>> and stitching as well as support for partial column updates. Would 
>>>>>>>>>>>> you
>>>>>>>>>>>> agree?
>>>>>>>>>>>>
>>>>>>>>>>>> ~ Anurag
>>>>>>>>>>>>
>>>>>>>>>>>> On Fri, Feb 13, 2026 at 9:33 AM Micah Kornfield <
>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Hi Anurag,
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Data Representation: Sparse column files are preferred for
>>>>>>>>>>>>>> compact representation and are better suited for partial column 
>>>>>>>>>>>>>> updates. We
>>>>>>>>>>>>>> can optimize sparse representation for vectorized reads by
>>>>>>>>>>>>>> filling in null or default values at read time for missing 
>>>>>>>>>>>>>> positions from
>>>>>>>>>>>>>> the base file, which avoids joins during reads.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> This seems like a classic MoR vs CoW trade-off.  But it seems
>>>>>>>>>>>>> like maybe both sparse and full should be available (I understand 
>>>>>>>>>>>>> this adds
>>>>>>>>>>>>> complexity).  For adding a new column or completely updating a 
>>>>>>>>>>>>> new column,
>>>>>>>>>>>>> the performance would be better to prefill the data (otherwise 
>>>>>>>>>>>>> one ends up
>>>>>>>>>>>>> duplicating the work that is already happening under the hood in 
>>>>>>>>>>>>> parquet).
>>>>>>>>>>>>>
>>>>>>>>>>>>> Is there evidence to say that partial column updates are more
>>>>>>>>>>>>> common in practice then full rewrites?
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>> Micah
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Thu, Feb 12, 2026 at 3:32 AM Eduard Tudenhöfner <
>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hey Anurag,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I wasn't able to make it to the sync but was hoping to watch
>>>>>>>>>>>>>> the recording afterwards.
>>>>>>>>>>>>>> I'm curious what the reasons were for discarding the
>>>>>>>>>>>>>> Parquet-native approach. Could you share a summary from what was 
>>>>>>>>>>>>>> discussed
>>>>>>>>>>>>>> in the sync please on that topic?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Tue, Feb 10, 2026 at 8:20 PM Anurag Mantripragada <
>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Hi all,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thank you for attending today's sync. Please find the
>>>>>>>>>>>>>>> meeting notes below. I apologize that we were unable to record 
>>>>>>>>>>>>>>> the session
>>>>>>>>>>>>>>> due to attendees not having record access.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Key updates and discussion points:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> *Decisions:*
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>    - Table Format vs. Parquet: There is a general consensus
>>>>>>>>>>>>>>>    that column update support should reside in the table 
>>>>>>>>>>>>>>> format. Consequently,
>>>>>>>>>>>>>>>    we have discarded the Parquet-native approach.
>>>>>>>>>>>>>>>    - Metadata Representation: To maintain clean metadata
>>>>>>>>>>>>>>>    and avoid complex resolution logic for readers, the goal is 
>>>>>>>>>>>>>>> to keep only
>>>>>>>>>>>>>>>    one metadata file per column. However, achieving this is 
>>>>>>>>>>>>>>> challenging if we
>>>>>>>>>>>>>>>    support partial updates, as multiple column files may exist 
>>>>>>>>>>>>>>> for the same
>>>>>>>>>>>>>>>    column (See open questions).
>>>>>>>>>>>>>>>    - Data Representation: Sparse column files are preferred
>>>>>>>>>>>>>>>    for compact representation and are better suited for partial 
>>>>>>>>>>>>>>> column
>>>>>>>>>>>>>>>    updates. We can optimize sparse representation for 
>>>>>>>>>>>>>>> vectorized reads by
>>>>>>>>>>>>>>>    filling in null or default values at read time for missing 
>>>>>>>>>>>>>>> positions from
>>>>>>>>>>>>>>>    the base file, which avoids joins during reads.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> *Open Questions: *
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>    - We are still determining what restrictions are
>>>>>>>>>>>>>>>    necessary when supporting partial updates. For instance, we 
>>>>>>>>>>>>>>> need to decide
>>>>>>>>>>>>>>>    whether to add a new column and subsequently allow partial 
>>>>>>>>>>>>>>> updates on it.
>>>>>>>>>>>>>>>    This would involve managing both a base column file and 
>>>>>>>>>>>>>>> subsequent update
>>>>>>>>>>>>>>>    files.
>>>>>>>>>>>>>>>    - We need a better understanding of the use cases for
>>>>>>>>>>>>>>>    partial updates.
>>>>>>>>>>>>>>>    - We need to further discuss the handling of equality
>>>>>>>>>>>>>>>    deletes.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> If I missed anything, or if others took notes, please share
>>>>>>>>>>>>>>> them here. Thanks!
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I will go ahead and update the doc with what we have
>>>>>>>>>>>>>>> discussed so we can continue next time from where we left off.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> ~ Anurag
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Mon, Feb 9, 2026 at 11:55 AM Anurag Mantripragada <
>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Hi all,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> This design
>>>>>>>>>>>>>>>> <https://docs.google.com/document/d/1Bd7JVzgajA8-DozzeEE24mID_GLuz6iwj0g4TlcVJcs/edit?tab=t.0>
>>>>>>>>>>>>>>>> will be discussed tomorrow in a dedicated sync.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Efficient column updates sync
>>>>>>>>>>>>>>>> Tuesday, February 10 · 9:00 – 10:00am
>>>>>>>>>>>>>>>> Time zone: America/Los_Angeles
>>>>>>>>>>>>>>>> Google Meet joining info
>>>>>>>>>>>>>>>> Video call link: https://meet.google.com/xsd-exug-tcd
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> ~ Anurag
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Fri, Feb 6, 2026 at 8:30 AM Anurag Mantripragada <
>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Hi Gabor,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Thanks for the detailed example.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I agree with Steven that Option 2 seems reasonable. I will
>>>>>>>>>>>>>>>>> add a section to the design doc regarding equality delete 
>>>>>>>>>>>>>>>>> handling, and we
>>>>>>>>>>>>>>>>> can discuss this further during our meeting on Tuesday.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> ~Anurag
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Fri, Feb 6, 2026 at 7:08 AM Steven Wu <
>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> > 1) When deleting with eq-deletes: If there is a column
>>>>>>>>>>>>>>>>>> update on the equality-filed ID we use for the delete, 
>>>>>>>>>>>>>>>>>> reject deletion
>>>>>>>>>>>>>>>>>> > 2) When adding a column update on a column that is
>>>>>>>>>>>>>>>>>> part of the equality field IDs in some delete, we reject the 
>>>>>>>>>>>>>>>>>> column update
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Gabor, this is a good scenario. The 2nd option makes
>>>>>>>>>>>>>>>>>> sense to me, since equality ids are like primary key fields. 
>>>>>>>>>>>>>>>>>> If we have the
>>>>>>>>>>>>>>>>>> 2nd rule enforced, the first option is not applicable 
>>>>>>>>>>>>>>>>>> anymore.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On Fri, Feb 6, 2026 at 3:13 AM Gábor Kaszab <
>>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Hey,
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Thank you for the proposal, Anurag! I made a pass
>>>>>>>>>>>>>>>>>>> recently and I think there is some interference between 
>>>>>>>>>>>>>>>>>>> column updates and
>>>>>>>>>>>>>>>>>>> equality deletes. Let me describe below:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Steps:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> CREATE TABLE tbl (int a, int b);
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> INSERT INTO tbl VALUES (1, 11), (2, 22);  -- creates the
>>>>>>>>>>>>>>>>>>> base data file
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> DELETE FROM tbl WHERE b=11;               -- creates an
>>>>>>>>>>>>>>>>>>> equality delete file
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> UPDATE tbl SET b=11;
>>>>>>>>>>>>>>>>>>>  -- writes column update
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> SELECT * FROM tbl;
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Expected result:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> (2, 11)
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Data and metadata created after the above steps:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Base file
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> (1, 11), (2, 22),
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> seqnum=1
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> EQ-delete
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> b=11
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> seqnum=2
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Column update
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Field ids: [field_id_for_col_b]
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> seqnum=3
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Data file content: (dummy_value),(11)
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Read steps:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>    1. Stitch base file with column updates in reader:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Rows: (1,dummy_value), (2,11) (Note, dummy value can
>>>>>>>>>>>>>>>>>>> be either null, or 11, see the proposal for more details)
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Seqnum for base file=1
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Seqnum for column update=3
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>    2. Apply eq-delete b=11, seqnum=3 on the stitched
>>>>>>>>>>>>>>>>>>>    result
>>>>>>>>>>>>>>>>>>>    3. Query result depends on which seqnum we carry
>>>>>>>>>>>>>>>>>>>    forward to compare with the eq-delete's seqnum, but it's 
>>>>>>>>>>>>>>>>>>> not correct in any
>>>>>>>>>>>>>>>>>>>    of the cases
>>>>>>>>>>>>>>>>>>>       1. Use seqnum from base file: we get either an
>>>>>>>>>>>>>>>>>>>       empty result if 'dummy_value' is 11 or we get (1, 
>>>>>>>>>>>>>>>>>>> null) otherwise
>>>>>>>>>>>>>>>>>>>       2. Use seqnum from last update file: don't delete
>>>>>>>>>>>>>>>>>>>       any rows, result set is (1, dummy_value),(2,11)
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Problem:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> EQ-delete should be applied midway applying the column
>>>>>>>>>>>>>>>>>>> updates to the base file based on sequence number, during 
>>>>>>>>>>>>>>>>>>> the stitching
>>>>>>>>>>>>>>>>>>> process. If I'm not mistaken, this is not feasible with the 
>>>>>>>>>>>>>>>>>>> way readers
>>>>>>>>>>>>>>>>>>> work.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Proposal:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Don't allow equality deletes together with column
>>>>>>>>>>>>>>>>>>> updates.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>   1) When deleting with eq-deletes: If there is a column
>>>>>>>>>>>>>>>>>>> update on the equality-filed ID we use for the delete, 
>>>>>>>>>>>>>>>>>>> reject deletion
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>   2) When adding a column update on a column that is
>>>>>>>>>>>>>>>>>>> part of the equality field IDs in some delete, we reject 
>>>>>>>>>>>>>>>>>>> the column update
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Alternatively, column updates could be controlled by a
>>>>>>>>>>>>>>>>>>> property of the table (immutable), and reject eq-deletes if 
>>>>>>>>>>>>>>>>>>> the property
>>>>>>>>>>>>>>>>>>> indicates column updates are turned on for the table
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Let me know what you think!
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Best Regards,
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Gabor
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Anurag Mantripragada <[email protected]> ezt
>>>>>>>>>>>>>>>>>>> írta (időpont: 2026. jan. 28., Sze, 3:31):
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Thank you everyone for the initial review comments. It
>>>>>>>>>>>>>>>>>>>> is exciting to see so much interest in this proposal.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> I am currently reviewing and responding to each
>>>>>>>>>>>>>>>>>>>> comment. The general themes of the feedback so far include:
>>>>>>>>>>>>>>>>>>>> - Including partial updates (column updates on a subset
>>>>>>>>>>>>>>>>>>>> of rows in a table).
>>>>>>>>>>>>>>>>>>>> - Adding details on how SQL engines will write the
>>>>>>>>>>>>>>>>>>>> update files.
>>>>>>>>>>>>>>>>>>>> - Adding details on split planning and row alignment
>>>>>>>>>>>>>>>>>>>> for update files.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> I will think through these points and update the design
>>>>>>>>>>>>>>>>>>>> accordingly.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Best
>>>>>>>>>>>>>>>>>>>> Anurag
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> On Tue, Jan 27, 2026 at 6:25 PM Anurag Mantripragada <
>>>>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Hi Xiangin,
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Happy to learn from your experience in supporting
>>>>>>>>>>>>>>>>>>>>> backfill use-cases. Please feel free to review the 
>>>>>>>>>>>>>>>>>>>>> proposal and add your
>>>>>>>>>>>>>>>>>>>>> comments. I will wait for a couple of days more to ensure 
>>>>>>>>>>>>>>>>>>>>> everyone has a
>>>>>>>>>>>>>>>>>>>>> chance to review the proposal.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> ~ Anurag
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> On Tue, Jan 27, 2026 at 6:42 AM Xianjin Ye <
>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Hi Anurag and Peter,
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> It’s great to see the partial column update has
>>>>>>>>>>>>>>>>>>>>>> gained great interest in the community. I internally 
>>>>>>>>>>>>>>>>>>>>>> built a
>>>>>>>>>>>>>>>>>>>>>> BackfillColumns action to efficiently backfill 
>>>>>>>>>>>>>>>>>>>>>> columns(by writing the
>>>>>>>>>>>>>>>>>>>>>> partial columns only and copies the binary data of other 
>>>>>>>>>>>>>>>>>>>>>> columns into a new
>>>>>>>>>>>>>>>>>>>>>> DataFile). The speedup could be 10x for wide tables but 
>>>>>>>>>>>>>>>>>>>>>> the write
>>>>>>>>>>>>>>>>>>>>>> amplification is still there. I would be happy to 
>>>>>>>>>>>>>>>>>>>>>> collaborate on the work
>>>>>>>>>>>>>>>>>>>>>> and eliminate the write amplification.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> On 2026/01/27 10:12:54 Péter Váry wrote:
>>>>>>>>>>>>>>>>>>>>>> > Hi Anurag,
>>>>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>>>>> > It’s great to see how much interest there is in the
>>>>>>>>>>>>>>>>>>>>>> community around this
>>>>>>>>>>>>>>>>>>>>>> > potential new feature. Gábor and I have actually
>>>>>>>>>>>>>>>>>>>>>> submitted an Iceberg
>>>>>>>>>>>>>>>>>>>>>> > Summit talk proposal on this topic, and we would be
>>>>>>>>>>>>>>>>>>>>>> very happy to
>>>>>>>>>>>>>>>>>>>>>> > collaborate on the work. I was mainly waiting for
>>>>>>>>>>>>>>>>>>>>>> the File Format API to be
>>>>>>>>>>>>>>>>>>>>>> > finalized, as I believe this feature should build
>>>>>>>>>>>>>>>>>>>>>> on top of it.
>>>>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>>>>> > For reference, our related work includes:
>>>>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>>>>> >    - *Dev list thread:*
>>>>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>>>>> https://lists.apache.org/thread/h0941sdq9jwrb6sj0pjfjjxov8tx7ov9
>>>>>>>>>>>>>>>>>>>>>> >    - *Proposal document:*
>>>>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>>>>> https://docs.google.com/document/d/1OHuZ6RyzZvCOQ6UQoV84GzwVp3UPiu_cfXClsOi03ww
>>>>>>>>>>>>>>>>>>>>>> >    (not shared widely yet)
>>>>>>>>>>>>>>>>>>>>>> >    - *Performance testing PR for readers and
>>>>>>>>>>>>>>>>>>>>>> writers:*
>>>>>>>>>>>>>>>>>>>>>> >    https://github.com/apache/iceberg/pull/13306
>>>>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>>>>> > During earlier discussions about possible metadata
>>>>>>>>>>>>>>>>>>>>>> changes, another option
>>>>>>>>>>>>>>>>>>>>>> > came up that hasn’t been documented yet: separating
>>>>>>>>>>>>>>>>>>>>>> planner metadata from
>>>>>>>>>>>>>>>>>>>>>> > reader metadata. Since the planner does not need to
>>>>>>>>>>>>>>>>>>>>>> know about the actual
>>>>>>>>>>>>>>>>>>>>>> > files, we could store the file composition in a
>>>>>>>>>>>>>>>>>>>>>> separate file (potentially
>>>>>>>>>>>>>>>>>>>>>> > a Puffin file). This file could hold the
>>>>>>>>>>>>>>>>>>>>>> column_files metadata, while the
>>>>>>>>>>>>>>>>>>>>>> > manifest would reference the Puffin file and blob
>>>>>>>>>>>>>>>>>>>>>> position instead of the
>>>>>>>>>>>>>>>>>>>>>> > data filename.
>>>>>>>>>>>>>>>>>>>>>> > This approach has the advantage of keeping the
>>>>>>>>>>>>>>>>>>>>>> existing metadata largely
>>>>>>>>>>>>>>>>>>>>>> > intact, and it could also give us a natural place
>>>>>>>>>>>>>>>>>>>>>> later to add file-level
>>>>>>>>>>>>>>>>>>>>>> > indexes or Bloom filters for use during reads or
>>>>>>>>>>>>>>>>>>>>>> secondary filtering. The
>>>>>>>>>>>>>>>>>>>>>> > downsides are the additional files and the
>>>>>>>>>>>>>>>>>>>>>> increased complexity of
>>>>>>>>>>>>>>>>>>>>>> > identifying files that are no longer referenced by
>>>>>>>>>>>>>>>>>>>>>> the table, so this may
>>>>>>>>>>>>>>>>>>>>>> > not be an ideal solution.
>>>>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>>>>> > I do have some concerns about the MoR metadata
>>>>>>>>>>>>>>>>>>>>>> proposal described in the
>>>>>>>>>>>>>>>>>>>>>> > document. At first glance, it seems to complicate
>>>>>>>>>>>>>>>>>>>>>> distributed planning, as
>>>>>>>>>>>>>>>>>>>>>> > all entries for a given file would need to be
>>>>>>>>>>>>>>>>>>>>>> collected and merged to
>>>>>>>>>>>>>>>>>>>>>> > provide the information required by both the
>>>>>>>>>>>>>>>>>>>>>> planner and the reader.
>>>>>>>>>>>>>>>>>>>>>> > Additionally, when a new column is added or
>>>>>>>>>>>>>>>>>>>>>> updated, we would still need to
>>>>>>>>>>>>>>>>>>>>>> > add a new metadata entry for every existing data
>>>>>>>>>>>>>>>>>>>>>> file. If we immediately
>>>>>>>>>>>>>>>>>>>>>> > write out the merged metadata, the total number of
>>>>>>>>>>>>>>>>>>>>>> entries remains the
>>>>>>>>>>>>>>>>>>>>>> > same. The main benefit is avoiding rewriting
>>>>>>>>>>>>>>>>>>>>>> statistics, which can be
>>>>>>>>>>>>>>>>>>>>>> > significant, but this comes at the cost of
>>>>>>>>>>>>>>>>>>>>>> increased planning complexity.
>>>>>>>>>>>>>>>>>>>>>> > If we choose to store the merged statistics in the
>>>>>>>>>>>>>>>>>>>>>> column_families entry, I
>>>>>>>>>>>>>>>>>>>>>> > don’t see much benefit in excluding the rest of the
>>>>>>>>>>>>>>>>>>>>>> metadata, especially
>>>>>>>>>>>>>>>>>>>>>> > since including it would simplify the planning
>>>>>>>>>>>>>>>>>>>>>> process.
>>>>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>>>>> > As Anton already pointed out, we should also
>>>>>>>>>>>>>>>>>>>>>> discuss how this change would
>>>>>>>>>>>>>>>>>>>>>> > affect split handling, particularly how to avoid
>>>>>>>>>>>>>>>>>>>>>> double reads when row
>>>>>>>>>>>>>>>>>>>>>> > groups are not aligned between the original data
>>>>>>>>>>>>>>>>>>>>>> files and the new column
>>>>>>>>>>>>>>>>>>>>>> > files.
>>>>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>>>>> > Finally, I’d like to see some discussion around the
>>>>>>>>>>>>>>>>>>>>>> Java API implications.
>>>>>>>>>>>>>>>>>>>>>> > In particular, what API changes are required, and
>>>>>>>>>>>>>>>>>>>>>> how SQL engines would
>>>>>>>>>>>>>>>>>>>>>> > perform updates. Since the new column files must
>>>>>>>>>>>>>>>>>>>>>> have the same number of
>>>>>>>>>>>>>>>>>>>>>> > rows as the original data files, with a strict
>>>>>>>>>>>>>>>>>>>>>> one-to-one relationship, SQL
>>>>>>>>>>>>>>>>>>>>>> > engines would need access to the source filename,
>>>>>>>>>>>>>>>>>>>>>> position, and deletion
>>>>>>>>>>>>>>>>>>>>>> > status in the DataFrame in order to generate the
>>>>>>>>>>>>>>>>>>>>>> new files. This is more
>>>>>>>>>>>>>>>>>>>>>> > involved than a simple update and deserves some
>>>>>>>>>>>>>>>>>>>>>> explicit consideration.
>>>>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>>>>> > Looking forward to your thoughts.
>>>>>>>>>>>>>>>>>>>>>> > Best regards,
>>>>>>>>>>>>>>>>>>>>>> > Peter
>>>>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>>>>> > On Tue, Jan 27, 2026, 03:58 Anurag Mantripragada <
>>>>>>>>>>>>>>>>>>>>>> [email protected]>
>>>>>>>>>>>>>>>>>>>>>> > wrote:
>>>>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>>>>> > > Thanks Anton and others, for providing some
>>>>>>>>>>>>>>>>>>>>>> initial feedback. I will
>>>>>>>>>>>>>>>>>>>>>> > > address all your comments soon.
>>>>>>>>>>>>>>>>>>>>>> > >
>>>>>>>>>>>>>>>>>>>>>> > > On Mon, Jan 26, 2026 at 11:10 AM Anton Okolnychyi
>>>>>>>>>>>>>>>>>>>>>> <[email protected]>
>>>>>>>>>>>>>>>>>>>>>> > > wrote:
>>>>>>>>>>>>>>>>>>>>>> > >
>>>>>>>>>>>>>>>>>>>>>> > >> I had a chance to see the proposal before it
>>>>>>>>>>>>>>>>>>>>>> landed and I think it is a
>>>>>>>>>>>>>>>>>>>>>> > >> cool idea and both presented approaches would
>>>>>>>>>>>>>>>>>>>>>> likely work. I am looking
>>>>>>>>>>>>>>>>>>>>>> > >> forward to discussing the tradeoffs and would
>>>>>>>>>>>>>>>>>>>>>> encourage everyone to
>>>>>>>>>>>>>>>>>>>>>> > >> push/polish each approach to see what issues can
>>>>>>>>>>>>>>>>>>>>>> be mitigated and what are
>>>>>>>>>>>>>>>>>>>>>> > >> fundamental.
>>>>>>>>>>>>>>>>>>>>>> > >>
>>>>>>>>>>>>>>>>>>>>>> > >> [1] Iceberg-native approach: better visibility
>>>>>>>>>>>>>>>>>>>>>> into column files from the
>>>>>>>>>>>>>>>>>>>>>> > >> metadata, potentially better concurrency for
>>>>>>>>>>>>>>>>>>>>>> non-overlapping column
>>>>>>>>>>>>>>>>>>>>>> > >> updates, no dep on Parquet.
>>>>>>>>>>>>>>>>>>>>>> > >> [2] Parquet-native approach: almost no changes
>>>>>>>>>>>>>>>>>>>>>> to the table format
>>>>>>>>>>>>>>>>>>>>>> > >> metadata beyond tracking of base files.
>>>>>>>>>>>>>>>>>>>>>> > >>
>>>>>>>>>>>>>>>>>>>>>> > >> I think [1] sounds a bit better on paper but I
>>>>>>>>>>>>>>>>>>>>>> am worried about the
>>>>>>>>>>>>>>>>>>>>>> > >> complexity in writers and readers (especially
>>>>>>>>>>>>>>>>>>>>>> around keeping row groups
>>>>>>>>>>>>>>>>>>>>>> > >> aligned and split planning). It would be great
>>>>>>>>>>>>>>>>>>>>>> to cover this in detail in
>>>>>>>>>>>>>>>>>>>>>> > >> the proposal.
>>>>>>>>>>>>>>>>>>>>>> > >>
>>>>>>>>>>>>>>>>>>>>>> > >> пн, 26 січ. 2026 р. о 09:00 Anurag Mantripragada
>>>>>>>>>>>>>>>>>>>>>> <
>>>>>>>>>>>>>>>>>>>>>> > >> [email protected]> пише:
>>>>>>>>>>>>>>>>>>>>>> > >>
>>>>>>>>>>>>>>>>>>>>>> > >>> Hi all,
>>>>>>>>>>>>>>>>>>>>>> > >>>
>>>>>>>>>>>>>>>>>>>>>> > >>> "Wide tables" with thousands of columns present
>>>>>>>>>>>>>>>>>>>>>> significant challenges
>>>>>>>>>>>>>>>>>>>>>> > >>> for AI/ML workloads, particularly when only a
>>>>>>>>>>>>>>>>>>>>>> subset of columns needs to be
>>>>>>>>>>>>>>>>>>>>>> > >>> added or updated. Current Copy-on-Write (COW)
>>>>>>>>>>>>>>>>>>>>>> and Merge-on-Read (MOR)
>>>>>>>>>>>>>>>>>>>>>> > >>> operations in Iceberg apply at the row level,
>>>>>>>>>>>>>>>>>>>>>> which leads to substantial
>>>>>>>>>>>>>>>>>>>>>> > >>> write amplification in scenarios such as:
>>>>>>>>>>>>>>>>>>>>>> > >>>
>>>>>>>>>>>>>>>>>>>>>> > >>>    - Feature Backfilling & Column Updates:
>>>>>>>>>>>>>>>>>>>>>> Adding new feature columns
>>>>>>>>>>>>>>>>>>>>>> > >>>    (e.g., model embeddings) to petabyte-scale
>>>>>>>>>>>>>>>>>>>>>> tables.
>>>>>>>>>>>>>>>>>>>>>> > >>>    - Model Score Updates: Refresh prediction
>>>>>>>>>>>>>>>>>>>>>> scores after retraining.
>>>>>>>>>>>>>>>>>>>>>> > >>>    - Embedding Refresh: Updating vector
>>>>>>>>>>>>>>>>>>>>>> embeddings, which currently
>>>>>>>>>>>>>>>>>>>>>> > >>>    triggers a rewrite of the entire row.
>>>>>>>>>>>>>>>>>>>>>> > >>>    - Incremental Feature Computation: Daily
>>>>>>>>>>>>>>>>>>>>>> updates to a small fraction
>>>>>>>>>>>>>>>>>>>>>> > >>>    of features in wide tables.
>>>>>>>>>>>>>>>>>>>>>> > >>>
>>>>>>>>>>>>>>>>>>>>>> > >>> With the Iceberg V4 proposal introducing
>>>>>>>>>>>>>>>>>>>>>> single-file commits and column
>>>>>>>>>>>>>>>>>>>>>> > >>> stats improvements, this is an ideal time to
>>>>>>>>>>>>>>>>>>>>>> address column-level updates
>>>>>>>>>>>>>>>>>>>>>> > >>> to better support these use cases.
>>>>>>>>>>>>>>>>>>>>>> > >>>
>>>>>>>>>>>>>>>>>>>>>> > >>> I have drafted a proposal that explores both
>>>>>>>>>>>>>>>>>>>>>> table-format enhancements
>>>>>>>>>>>>>>>>>>>>>> > >>> and file-format (Parquet) changes to enable
>>>>>>>>>>>>>>>>>>>>>> more efficient updates.
>>>>>>>>>>>>>>>>>>>>>> > >>>
>>>>>>>>>>>>>>>>>>>>>> > >>> Proposal Details:
>>>>>>>>>>>>>>>>>>>>>> > >>> - GitHub Issue: #15146 <
>>>>>>>>>>>>>>>>>>>>>> https://github.com/apache/iceberg/issues/15146>
>>>>>>>>>>>>>>>>>>>>>> > >>> - Design Document: Efficient Column Updates in
>>>>>>>>>>>>>>>>>>>>>> Iceberg
>>>>>>>>>>>>>>>>>>>>>> > >>> <
>>>>>>>>>>>>>>>>>>>>>> https://docs.google.com/document/d/1Bd7JVzgajA8-DozzeEE24mID_GLuz6iwj0g4TlcVJcs/edit?tab=t.0
>>>>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>>>>> > >>>
>>>>>>>>>>>>>>>>>>>>>> > >>> Next Steps:
>>>>>>>>>>>>>>>>>>>>>> > >>> I plan to create POCs to benchmark the
>>>>>>>>>>>>>>>>>>>>>> approaches described in the
>>>>>>>>>>>>>>>>>>>>>> > >>> document.
>>>>>>>>>>>>>>>>>>>>>> > >>>
>>>>>>>>>>>>>>>>>>>>>> > >>> Please review the proposal and share your
>>>>>>>>>>>>>>>>>>>>>> feedback.
>>>>>>>>>>>>>>>>>>>>>> > >>>
>>>>>>>>>>>>>>>>>>>>>> > >>> Thanks,
>>>>>>>>>>>>>>>>>>>>>> > >>> Anurag
>>>>>>>>>>>>>>>>>>>>>> > >>>
>>>>>>>>>>>>>>>>>>>>>> > >>
>>>>>>>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>

Re: [Discuss] Efficient column updates in Iceberg

Reply via email to