Re: [Discuss] Efficient column updates in Iceberg

2026-03-10 Thread Péter Váry
Hi all,


*Short summary:*Surfacing changed rows ultimately requires a merged
*_last_updated_sequence_number* (LUSN) column for both partial and full
updates, which significantly reduces the benefits of partial updates and
adds complexity. Given this, moving forward with full column updates seems
reasonable.


*Details:*For partial updates, we can assume that update files contain only
changed rows, but if we rely on that, it needs to be explicitly guaranteed.
For full updates, the engine can compute changed rows directly since
original values are available.

The requirement to identify updated rows could be satisfied using the
*_last_updated_sequence_number* column. Every update must correctly advance
LUSN for updated rows, even when consecutive updates touch disjoint row
sets. As a result, each update must carry forward a merged LUSN view,
regardless of whether the update is partial or full.

With partial updates, this leaves us with three options:

   1. Store LUSN in a separate file (extra file per write),
   2. Rewrite a larger cell matrix covering all rows changed since the base
   file (more read and write I/O, but no extra file),
   3. Resolve LUSN at read time by merging multiple files (added read
   complexity and file access).

Option (2) seems the most reasonable, but it further reduces the gains of
partial column updates and increases writer complexity.

Overall, once LUSN handling is factored in, the advantages of partial
updates shrink considerably, while the complexity grows.

Best regards,
Peter


Steven Wu  ezt írta (időpont: 2026. márc. 6., P,
2:12):

> I watched the recording and had the same question regarding the sparse
> representation that Peter mentioned.
>
>
> DVs seem like a good fit for updates columns for a small percentage of
> rows in a file. Full column update is good for updating a high percentage
> of rows. If we are going with this high-level guideline, what's the need
> for the sparse representation? The discussion brought up one reason for the
> sparse representation: handling deleted rows (via DV). E.g., a data file
> has 100 rows and 10 rows have been deleted via DV. How should we generate
> the column file: 90 or 100 rows? The former would require sparse
> representation for tracking row positions in the column file. The latter
> would require solving these two problems.
>
>
> 1) What column values should be filled for deleted rows? Null or default
> values seem good candidates. In the end, those column values don't really
> matter. Null may mess up the null_value_count stats. However, it is
> possible for the writer to keep the null_value_count correct in this case.
> Besides, we already have the imprecise column stats (min, max, null_count)
> after applying a DV to a data file. Extra data files can be scanned and
> rows will be ignored via residual filter. There is no impact on query
> correctness.
>
> 2) How to detect deleted rows? I remember Russell briefly mentioned one
> idea in the first meeting using the `pos` metadata column (ordinal position
> within a file). If the records are processed in the same order, we can use
> the gap to detect deleted rows. E.g., the last processed row position is
> 10, and the current row position is 13. Then we know rows 11 and 12 have
> been deleted and null values can be persisted into the column file for the
> deleted rows.
>
> On Thu, Mar 5, 2026 at 5:57 AM Péter Váry 
> wrote:
>
>> Hi everyone,
>>
>> I had to drop off after the first half hour, but I watched the recording
>> afterward and discussed the topic in depth with Gábor.
>>
>> *TL;DR*
>>
>>1. My intuition is that the volume of updated data is usually small
>>compared to the original file. Even for a table with, say, 1000 columns,
>>updating only a few columns typically produces relatively little data, 
>> even
>>if those columns are rewritten in full. As a result, overall cost is often
>>dominated more by file access and seek overhead than by the actual amount
>>of data read or written. This suggests we should favor a simpler solution
>>and support only full column updates.
>>2. Predicate pushdown does not work well with partial updates,
>>whereas with some effort it can be made to work with full column updates.
>>3. If we do want to support partial updates, I agree that sparse
>>update files make sense. However, if we decide not to support partial
>>updates, I think we should revisit the decision to use a custom encoding
>>for update files. In that case, update files will typically contain very
>>few deleted rows, which invalidates several assumptions behind sparse
>>encodings. In this scenario, we could relatively cheaply add the `_file`,
>>`_pos`, and `_deleted` columns to the read query, use that information to
>>write out the results, and delegate the encoding to Parquet. Parquet
>>already provides efficient encodings for columns that are not extremely
>>sparse, and it would be difficult

Re: [Discuss] Efficient column updates in Iceberg

2026-03-05 Thread Steven Wu
I watched the recording and had the same question regarding the sparse
representation that Peter mentioned.


DVs seem like a good fit for updates columns for a small percentage of rows
in a file. Full column update is good for updating a high percentage of
rows. If we are going with this high-level guideline, what's the need for
the sparse representation? The discussion brought up one reason for the
sparse representation: handling deleted rows (via DV). E.g., a data file
has 100 rows and 10 rows have been deleted via DV. How should we generate
the column file: 90 or 100 rows? The former would require sparse
representation for tracking row positions in the column file. The latter
would require solving these two problems.


1) What column values should be filled for deleted rows? Null or default
values seem good candidates. In the end, those column values don't really
matter. Null may mess up the null_value_count stats. However, it is
possible for the writer to keep the null_value_count correct in this case.
Besides, we already have the imprecise column stats (min, max, null_count)
after applying a DV to a data file. Extra data files can be scanned and
rows will be ignored via residual filter. There is no impact on query
correctness.

2) How to detect deleted rows? I remember Russell briefly mentioned one
idea in the first meeting using the `pos` metadata column (ordinal position
within a file). If the records are processed in the same order, we can use
the gap to detect deleted rows. E.g., the last processed row position is
10, and the current row position is 13. Then we know rows 11 and 12 have
been deleted and null values can be persisted into the column file for the
deleted rows.

On Thu, Mar 5, 2026 at 5:57 AM Péter Váry 
wrote:

> Hi everyone,
>
> I had to drop off after the first half hour, but I watched the recording
> afterward and discussed the topic in depth with Gábor.
>
> *TL;DR*
>
>1. My intuition is that the volume of updated data is usually small
>compared to the original file. Even for a table with, say, 1000 columns,
>updating only a few columns typically produces relatively little data, even
>if those columns are rewritten in full. As a result, overall cost is often
>dominated more by file access and seek overhead than by the actual amount
>of data read or written. This suggests we should favor a simpler solution
>and support only full column updates.
>2. Predicate pushdown does not work well with partial updates, whereas
>with some effort it can be made to work with full column updates.
>3. If we do want to support partial updates, I agree that sparse
>update files make sense. However, if we decide not to support partial
>updates, I think we should revisit the decision to use a custom encoding
>for update files. In that case, update files will typically contain very
>few deleted rows, which invalidates several assumptions behind sparse
>encodings. In this scenario, we could relatively cheaply add the `_file`,
>`_pos`, and `_deleted` columns to the read query, use that information to
>write out the results, and delegate the encoding to Parquet. Parquet
>already provides efficient encodings for columns that are not extremely
>sparse, and it would be difficult to outperform that with a custom 
> solution.
>
> *In detail*
>
> What we gain compared to full column updates is mostly on the write path
>
>- We don’t need to read unchanged column values.
>- We may not need to touch the original data file at all.
>- We don’t need to write unchanged values (although we still need to
>create a new file).
>
>
> What we lose is mostly on the read path
>
>- We need to read the column from the original file.
>- We also need to read the update file (even if it’s small, it’s still
>an additional file access).
>- Predicate pushdown does not work on the updated column; filters must
>be applied manually. Predicate pushdown can only be applied to the update
>file itself.
>
>
> Edge cases
>
>- Partial updates shine when updates do not require reading from the
>original table at all.
>- Full updates are best when reads only need to touch the newly
>written data and can completely ignore the original file.
>
>
> Typical case comparison
> In practice, both approaches look quite similar in terms of file access:
>
>- Reads: original file + new data file in both cases.
>- Writes: read the original file (and any existing update file) and
>write new data in both cases.
>
> The main differences are:
>
>- With partial column updates, we read and write less data during
>updates (only the changed cells).
>- With full column updates,
>   - Reads are cheaper because data is already merged into a single
>   file and we don’t need to read old column data.
>   - Predicate pushdown can work, although we still need to combine
>   with columns from the base file

Re: [Discuss] Efficient column updates in Iceberg

2026-03-05 Thread Péter Váry
Hi everyone,

I had to drop off after the first half hour, but I watched the recording
afterward and discussed the topic in depth with Gábor.

*TL;DR*

   1. My intuition is that the volume of updated data is usually small
   compared to the original file. Even for a table with, say, 1000 columns,
   updating only a few columns typically produces relatively little data, even
   if those columns are rewritten in full. As a result, overall cost is often
   dominated more by file access and seek overhead than by the actual amount
   of data read or written. This suggests we should favor a simpler solution
   and support only full column updates.
   2. Predicate pushdown does not work well with partial updates, whereas
   with some effort it can be made to work with full column updates.
   3. If we do want to support partial updates, I agree that sparse update
   files make sense. However, if we decide not to support partial updates, I
   think we should revisit the decision to use a custom encoding for update
   files. In that case, update files will typically contain very few deleted
   rows, which invalidates several assumptions behind sparse encodings. In
   this scenario, we could relatively cheaply add the `_file`, `_pos`, and
   `_deleted` columns to the read query, use that information to write out the
   results, and delegate the encoding to Parquet. Parquet already provides
   efficient encodings for columns that are not extremely sparse, and it would
   be difficult to outperform that with a custom solution.

*In detail*

What we gain compared to full column updates is mostly on the write path

   - We don’t need to read unchanged column values.
   - We may not need to touch the original data file at all.
   - We don’t need to write unchanged values (although we still need to
   create a new file).


What we lose is mostly on the read path

   - We need to read the column from the original file.
   - We also need to read the update file (even if it’s small, it’s still
   an additional file access).
   - Predicate pushdown does not work on the updated column; filters must
   be applied manually. Predicate pushdown can only be applied to the update
   file itself.


Edge cases

   - Partial updates shine when updates do not require reading from the
   original table at all.
   - Full updates are best when reads only need to touch the newly written
   data and can completely ignore the original file.


Typical case comparison
In practice, both approaches look quite similar in terms of file access:

   - Reads: original file + new data file in both cases.
   - Writes: read the original file (and any existing update file) and
   write new data in both cases.

The main differences are:

   - With partial column updates, we read and write less data during
   updates (only the changed cells).
   - With full column updates,
  - Reads are cheaper because data is already merged into a single file
  and we don’t need to read old column data.
  - Predicate pushdown can work, although we still need to combine with
  columns from the base file.


Overall, the key difference is the amount of column data read and written,
not full file sizes. At that point, file access patterns and seek overhead
tend to dominate the cost rather than raw I/O volume.

Anurag Mantripragada  ezt írta (időpont:
2026. márc. 5., Cs, 3:35):

> Hi everyone!
>
> Thanks for joining the sync today. Sorry, Google cut us off while Gabor
> was explaining his POC work. We can discuss that in the next meeting. Here
> is the recording .
>
> *Meeting notes:*
>
> *Partial updates *
>
>- We could potentially support partial updates if the writer could
>merge all the existing updates COW style into a new column file. We could
>potentially explore this, but the general consensus was to favor a single
>mechanism for updates, whether partial or not. This requires some more
>thought and we can iterate over it.
>- This remains an open question until we consider all the synchronous
>writing cases.
>
> *Column File Row Alignment*
>
>- We generally agreed on using sparse Parquet files to store updates. Each
>update file contains only the modified values and their corresponding row
>positions from the base file.
>- *Rationale:* This avoids the stats-corruption risk of full, padded
>files (which would require filling non-updated rows with arbitrary values)
>and the Parquet limitation against top-level nulls.
>- *Read Path:* Readers will materialize the sparse updates into a full
>buffer with nulls, then efficiently merge by position.
>
> Single Update File Per Column
>
>- To simplify reads, each base file can have only one active update
>file per column.
>- Subsequent updates must rewrite the existing update file,
>synchronously applying all prior changes.
>- This avoids the complexity of merging multiple update files during
>the read p

Re: [Discuss] Efficient column updates in Iceberg

2026-03-04 Thread Anurag Mantripragada
Hi everyone!

Thanks for joining the sync today. Sorry, Google cut us off while Gabor was
explaining his POC work. We can discuss that in the next meeting. Here is
the recording .

*Meeting notes:*

*Partial updates *

   - We could potentially support partial updates if the writer could merge
   all the existing updates COW style into a new column file. We could
   potentially explore this, but the general consensus was to favor a single
   mechanism for updates, whether partial or not. This requires some more
   thought and we can iterate over it.
   - This remains an open question until we consider all the synchronous
   writing cases.

*Column File Row Alignment*

   - We generally agreed on using sparse Parquet files to store updates. Each
   update file contains only the modified values and their corresponding row
   positions from the base file.
   - *Rationale:* This avoids the stats-corruption risk of full, padded
   files (which would require filling non-updated rows with arbitrary values)
   and the Parquet limitation against top-level nulls.
   - *Read Path:* Readers will materialize the sparse updates into a full
   buffer with nulls, then efficiently merge by position.

Single Update File Per Column

   - To simplify reads, each base file can have only one active update file
   per column.
   - Subsequent updates must rewrite the existing update file,
   synchronously applying all prior changes.
   - This avoids the complexity of merging multiple update files during the
   read path.

*Other open questions*

   - *Change Detection: *We need to think more about how change detection
   would work with the synchronous update case. The V4 spec is undergoing
   revisions to support other use cases, and we should follow that work to see
   how this design aligns with it.

*Next Steps*

   - *Anurag:* Update the design doc with the sparse file format, the
   single-update-file rule and add details about how this would work in
   various scenarios.
   - *Anurag:* Review the V4 CDC metadata proposal to ensure alignment with
   the column update design.
   - *Gábor:* Continue developing the POC, focusing on the synchronous
   rewrite logic and reader implementation. *Anurag* will work on the Spark
   plumbing needed to materialize only the changed rows and the planner
   changes in Spark 4.x
   - *All:* Schedule a follow-up meeting to review the updated design doc.

Thanks,
Anurag


On Wed, Mar 4, 2026 at 1:47 PM Anton Okolnychyi 
wrote:

> Gabor, I know Anurag also expressed interest in extending Spark DML to
> accommodate column updates. I am happy to work with both of you to get the
> Spark piece designed and implemented. It is not something we would be able
> to handle in Iceberg via extensions.
>
> Regarding partial updates, I agree we will have to iterate on open
> questions before making a call on whether to support this functionality.
> Can you elaborate on the last use case you mentioned? Why would we have to
> combine {1} with {2, 3}? Will it be possible to produce a column file with
> only affected columns in each write?
>
> ср, 4 бер. 2026 р. о 12:13 Gábor Kaszab  пише:
>
>> Hey All,
>>
>> Apparently, the meeting dropped all of us after exactly one hour :) At
>> the end I just wanted to mention that during my attempt to implement a PoC
>> I found a couple of missing building blocks (collecting the updated field
>> IDs when committing after a Spark write; tweaking UPDATE's plan e.g.
>> adding/removing columns compared to CoW) and also found some interesting
>> technical details/questions (e.g. how to align rows when reading a split
>> based on base file's split_offsets) that we could discuss next time. I'll
>> collect all of these and share.
>>
>> In the meantime, I gave another thought to the *partial updates* idea
>> Anton mentioned where we can basically have the same metadata and read path
>> as for the full column update approach, and we'd push the responsibility to
>> the writers to always merge existing updates with new ones. I think in
>> theory, this seems a reasonable design and seems not that complicated to
>> implement when the new update aligns with the field IDs of some of the
>> existing updates. For instance, partially updating rows by field ID1 and
>> then updating different rows also for the same field ID seems
>> straightforward to merge these into a new file and refer that file from the
>> metadata.
>> However, I'm not sure how trivial it is when we update overlapping but
>> not entirely the same set of fields. E.g first partially updating by fields
>> {1, 2} then by {2, 3}. I don't think we want to merge these into 1 and have
>> a single update for {1, 2, 3} as that would have a snowball effect of
>> merging more and more cols together by time. But I don't think we want to
>> split them either, or require a separate partial update for each field
>> (wouldn't be suitable for column families either later on).
>>
>> Cheers,
>> Gabor
>>
>> Micah

Re: [Discuss] Efficient column updates in Iceberg

2026-03-04 Thread Anton Okolnychyi
Gabor, I know Anurag also expressed interest in extending Spark DML to
accommodate column updates. I am happy to work with both of you to get the
Spark piece designed and implemented. It is not something we would be able
to handle in Iceberg via extensions.

Regarding partial updates, I agree we will have to iterate on open
questions before making a call on whether to support this functionality.
Can you elaborate on the last use case you mentioned? Why would we have to
combine {1} with {2, 3}? Will it be possible to produce a column file with
only affected columns in each write?

ср, 4 бер. 2026 р. о 12:13 Gábor Kaszab  пише:

> Hey All,
>
> Apparently, the meeting dropped all of us after exactly one hour :) At the
> end I just wanted to mention that during my attempt to implement a PoC I
> found a couple of missing building blocks (collecting the updated field IDs
> when committing after a Spark write; tweaking UPDATE's plan e.g.
> adding/removing columns compared to CoW) and also found some interesting
> technical details/questions (e.g. how to align rows when reading a split
> based on base file's split_offsets) that we could discuss next time. I'll
> collect all of these and share.
>
> In the meantime, I gave another thought to the *partial updates* idea
> Anton mentioned where we can basically have the same metadata and read path
> as for the full column update approach, and we'd push the responsibility to
> the writers to always merge existing updates with new ones. I think in
> theory, this seems a reasonable design and seems not that complicated to
> implement when the new update aligns with the field IDs of some of the
> existing updates. For instance, partially updating rows by field ID1 and
> then updating different rows also for the same field ID seems
> straightforward to merge these into a new file and refer that file from the
> metadata.
> However, I'm not sure how trivial it is when we update overlapping but not
> entirely the same set of fields. E.g first partially updating by fields {1,
> 2} then by {2, 3}. I don't think we want to merge these into 1 and have a
> single update for {1, 2, 3} as that would have a snowball effect of merging
> more and more cols together by time. But I don't think we want to split
> them either, or require a separate partial update for each field (wouldn't
> be suitable for column families either later on).
>
> Cheers,
> Gabor
>
> Micah Kornfield  ezt írta (időpont: 2026. márc.
> 4., Sze, 0:32):
>
>> If this is correct, it aligns well with the current proposal and
>>> shouldn't introduce any additional complexity. I will add it to the
>>> discussion points for tomorrow's community sync.
>>
>>
>> Yes, this example aligns with what I was thinking (nit: "range" probably
>> wouldn't be a string but I assume this was just for illustrative purposes)
>>
>> On the other hand, in the column family use case, splitting columns is a
>>> strict requirement for performance. I haven’t considered how this would
>>> work, but perhaps we could introduce a table property for column families
>>> to make this explicit, and compaction jobs would have to respect
>>
>>
>> Yeah, I don't want to get into the exact mechanics for column families. I
>> was just calling out that compaction to the base file is not desirable in
>> all cases, so shouldn't be assumed as a solution for small files.
>>
>> Thanks,
>> Micah
>>
>>
>>
>> On Tue, Mar 3, 2026 at 3:11 PM Anurag Mantripragada <
>> [email protected]> wrote:
>>
>>> Hi Micah,
>>>
>>> Could you expand on the complexity you think this introduces (or more
 specifically "significant" part)?
>>>
>>> I may have misunderstood your approach regarding packing row ranges. To
>>> clarify, is the following what you had in mind?
>>>
>>> Initially, we have base_file_1.parquet (rows 1-1000) and
>>> base_file_2.parquet (rows 1001-2000). If we update the "score" column
>>> across both files and pack those updates into a single larger file,
>>> packed_col_A.parquet, would the metadata structure look like this?
>>>
>>>   {"data_file_path": "base_file_1.parquet","column_updates": [  
>>> {"field_id": 12,"update_file_path": "packed_col_A.parquet", 
>>>"row_range": "0-1000"  }]  },  {"data_file_path": 
>>> "base_file_2.parquet","column_updates": [ {"field_id": 12,  
>>>   "update_file_path": "packed_col_A.parquet","row_range": 
>>> "1001-2000"}]  }
>>>
>>>
>>> If this is correct, it aligns well with the current proposal and
>>> shouldn't introduce any additional complexity. I will add it to the
>>> discussion points for tomorrow's community sync.
>>>
>>>
>>> This seems at odds with supporting column families in the future?
>>>
>>> In my opinion, there’s a distinction between the use cases of column
>>> updates and column families. Column updates are designed for fast writes
>>> while maintaining reasonable read performance. Compaction is desirable to
>>> reduce

Re: [Discuss] Efficient column updates in Iceberg

2026-03-04 Thread Gábor Kaszab
Hey All,

Apparently, the meeting dropped all of us after exactly one hour :) At the
end I just wanted to mention that during my attempt to implement a PoC I
found a couple of missing building blocks (collecting the updated field IDs
when committing after a Spark write; tweaking UPDATE's plan e.g.
adding/removing columns compared to CoW) and also found some interesting
technical details/questions (e.g. how to align rows when reading a split
based on base file's split_offsets) that we could discuss next time. I'll
collect all of these and share.

In the meantime, I gave another thought to the *partial updates* idea Anton
mentioned where we can basically have the same metadata and read path as
for the full column update approach, and we'd push the responsibility to
the writers to always merge existing updates with new ones. I think in
theory, this seems a reasonable design and seems not that complicated to
implement when the new update aligns with the field IDs of some of the
existing updates. For instance, partially updating rows by field ID1 and
then updating different rows also for the same field ID seems
straightforward to merge these into a new file and refer that file from the
metadata.
However, I'm not sure how trivial it is when we update overlapping but not
entirely the same set of fields. E.g first partially updating by fields {1,
2} then by {2, 3}. I don't think we want to merge these into 1 and have a
single update for {1, 2, 3} as that would have a snowball effect of merging
more and more cols together by time. But I don't think we want to split
them either, or require a separate partial update for each field (wouldn't
be suitable for column families either later on).

Cheers,
Gabor

Micah Kornfield  ezt írta (időpont: 2026. márc. 4.,
Sze, 0:32):

> If this is correct, it aligns well with the current proposal and shouldn't
>> introduce any additional complexity. I will add it to the discussion points
>> for tomorrow's community sync.
>
>
> Yes, this example aligns with what I was thinking (nit: "range" probably
> wouldn't be a string but I assume this was just for illustrative purposes)
>
> On the other hand, in the column family use case, splitting columns is a
>> strict requirement for performance. I haven’t considered how this would
>> work, but perhaps we could introduce a table property for column families
>> to make this explicit, and compaction jobs would have to respect
>
>
> Yeah, I don't want to get into the exact mechanics for column families. I
> was just calling out that compaction to the base file is not desirable in
> all cases, so shouldn't be assumed as a solution for small files.
>
> Thanks,
> Micah
>
>
>
> On Tue, Mar 3, 2026 at 3:11 PM Anurag Mantripragada <
> [email protected]> wrote:
>
>> Hi Micah,
>>
>> Could you expand on the complexity you think this introduces (or more
>>> specifically "significant" part)?
>>
>> I may have misunderstood your approach regarding packing row ranges. To
>> clarify, is the following what you had in mind?
>>
>> Initially, we have base_file_1.parquet (rows 1-1000) and
>> base_file_2.parquet (rows 1001-2000). If we update the "score" column
>> across both files and pack those updates into a single larger file,
>> packed_col_A.parquet, would the metadata structure look like this?
>>
>>   {"data_file_path": "base_file_1.parquet","column_updates": [  
>> {"field_id": 12,"update_file_path": "packed_col_A.parquet",  
>>   "row_range": "0-1000"  }]  },  {"data_file_path": 
>> "base_file_2.parquet","column_updates": [ {"field_id": 12,   
>>  "update_file_path": "packed_col_A.parquet","row_range": 
>> "1001-2000"}]  }
>>
>>
>> If this is correct, it aligns well with the current proposal and
>> shouldn't introduce any additional complexity. I will add it to the
>> discussion points for tomorrow's community sync.
>>
>>
>> This seems at odds with supporting column families in the future?
>>
>> In my opinion, there’s a distinction between the use cases of column
>> updates and column families. Column updates are designed for fast writes
>> while maintaining reasonable read performance. Compaction is desirable to
>> reduce the complexity of the read side, if any. On the other hand, in the
>> column family use case, splitting columns is a strict requirement for
>> performance. I haven’t considered how this would work, but perhaps we could
>> introduce a table property for column families to make this explicit, and
>> compaction jobs would have to respect it.
>>
>> ~Anurag
>>
>> On Tue, Mar 3, 2026 at 12:02 PM Micah Kornfield 
>> wrote:
>>
>>> Hi Anurag,
>>>
 *Compaction and small files*: If I understand the row ranges idea
 correctly, packing multiple updates into larger column files would require
 matching ranges to base files based on predicates, which adds significant
 planning complexity. Regular compaction, which rewrites column files into
 t

Re: [Discuss] Efficient column updates in Iceberg

2026-03-03 Thread Micah Kornfield
>
> If this is correct, it aligns well with the current proposal and shouldn't
> introduce any additional complexity. I will add it to the discussion points
> for tomorrow's community sync.


Yes, this example aligns with what I was thinking (nit: "range" probably
wouldn't be a string but I assume this was just for illustrative purposes)

On the other hand, in the column family use case, splitting columns is a
> strict requirement for performance. I haven’t considered how this would
> work, but perhaps we could introduce a table property for column families
> to make this explicit, and compaction jobs would have to respect


Yeah, I don't want to get into the exact mechanics for column families. I
was just calling out that compaction to the base file is not desirable in
all cases, so shouldn't be assumed as a solution for small files.

Thanks,
Micah



On Tue, Mar 3, 2026 at 3:11 PM Anurag Mantripragada <
[email protected]> wrote:

> Hi Micah,
>
> Could you expand on the complexity you think this introduces (or more
>> specifically "significant" part)?
>
> I may have misunderstood your approach regarding packing row ranges. To
> clarify, is the following what you had in mind?
>
> Initially, we have base_file_1.parquet (rows 1-1000) and
> base_file_2.parquet (rows 1001-2000). If we update the "score" column
> across both files and pack those updates into a single larger file,
> packed_col_A.parquet, would the metadata structure look like this?
>
>   {"data_file_path": "base_file_1.parquet","column_updates": [  { 
>"field_id": 12,"update_file_path": "packed_col_A.parquet", 
>"row_range": "0-1000"  }]  },  {"data_file_path": 
> "base_file_2.parquet","column_updates": [ {"field_id": 12,
> "update_file_path": "packed_col_A.parquet","row_range": 
> "1001-2000"}]  }
>
>
> If this is correct, it aligns well with the current proposal and shouldn't
> introduce any additional complexity. I will add it to the discussion points
> for tomorrow's community sync.
>
>
> This seems at odds with supporting column families in the future?
>
> In my opinion, there’s a distinction between the use cases of column
> updates and column families. Column updates are designed for fast writes
> while maintaining reasonable read performance. Compaction is desirable to
> reduce the complexity of the read side, if any. On the other hand, in the
> column family use case, splitting columns is a strict requirement for
> performance. I haven’t considered how this would work, but perhaps we could
> introduce a table property for column families to make this explicit, and
> compaction jobs would have to respect it.
>
> ~Anurag
>
> On Tue, Mar 3, 2026 at 12:02 PM Micah Kornfield 
> wrote:
>
>> Hi Anurag,
>>
>>> *Compaction and small files*: If I understand the row ranges idea
>>> correctly, packing multiple updates into larger column files would require
>>> matching ranges to base files based on predicates, which adds significant
>>> planning complexity. Regular compaction, which rewrites column files into
>>> the base file seems more practical.
>>
>>
>> Could you expand on the complexity you think this introduces (or more
>> specifically "significant" part)? In this case the predicate should be
>> pretty simple (i.e. read rows between X and Y only) and can be done
>> efficiently via row group statistics.  Smart writers could even partition
>> rows for a specific base file into their own row group/pages to make the
>> filter trivial.
>>
>> Regular compaction, which rewrites column files into the base file seems
>>> more practical.
>>
>>
>> This seems at odds with supporting column families in the future?
>>
>> Thanks,
>> Micah
>>
>>
>> On Tue, Mar 3, 2026 at 11:43 AM Anurag Mantripragada <
>> [email protected]> wrote:
>>
>>> Hi all,
>>>
>>> Sorry for the delayed response. I was on vacation and catching up.
>>> Thanks for the continued discussion on this topic.
>>>
>>> *Partial updates*: I agree that MoR-style row-level updates offer
>>> limited benefits beyond reducing the writing of irrelevant columns. For use
>>> cases like updating a subset of users, existing deletion vectors and the
>>> new V4 manifest delete vectors should perform well. Gabor’s suggestion for
>>> file-level partial updates is a reasonable alternative, even with some
>>> write amplification.
>>>
>>> *Compaction and small files*: If I understand the row ranges idea
>>> correctly, packing multiple updates into larger column files would require
>>> matching ranges to base files based on predicates, which adds significant
>>> planning complexity. Regular compaction, which rewrites column files into
>>> the base file seems more practical.
>>>
>>> *Column families*: While splitting columns into families is useful, the
>>> current design is more generic and already supports packing families into
>>> column files. Deciding how to group these columns (manually or via an
>

Re: [Discuss] Efficient column updates in Iceberg

2026-03-03 Thread Anurag Mantripragada
Hi Micah,

Could you expand on the complexity you think this introduces (or more
> specifically "significant" part)?

I may have misunderstood your approach regarding packing row ranges. To
clarify, is the following what you had in mind?

Initially, we have base_file_1.parquet (rows 1-1000) and
base_file_2.parquet (rows 1001-2000). If we update the "score" column
across both files and pack those updates into a single larger file,
packed_col_A.parquet, would the metadata structure look like this?

  {"data_file_path": "base_file_1.parquet","column_updates": [
 {"field_id": 12,"update_file_path":
"packed_col_A.parquet","row_range": "0-1000"  }]  },
{"data_file_path": "base_file_2.parquet","column_updates": [
  {"field_id": 12,"update_file_path":
"packed_col_A.parquet","row_range": "1001-2000"}]  }


If this is correct, it aligns well with the current proposal and shouldn't
introduce any additional complexity. I will add it to the discussion points
for tomorrow's community sync.


This seems at odds with supporting column families in the future?

In my opinion, there’s a distinction between the use cases of column
updates and column families. Column updates are designed for fast writes
while maintaining reasonable read performance. Compaction is desirable to
reduce the complexity of the read side, if any. On the other hand, in the
column family use case, splitting columns is a strict requirement for
performance. I haven’t considered how this would work, but perhaps we could
introduce a table property for column families to make this explicit, and
compaction jobs would have to respect it.

~Anurag

On Tue, Mar 3, 2026 at 12:02 PM Micah Kornfield 
wrote:

> Hi Anurag,
>
>> *Compaction and small files*: If I understand the row ranges idea
>> correctly, packing multiple updates into larger column files would require
>> matching ranges to base files based on predicates, which adds significant
>> planning complexity. Regular compaction, which rewrites column files into
>> the base file seems more practical.
>
>
> Could you expand on the complexity you think this introduces (or more
> specifically "significant" part)? In this case the predicate should be
> pretty simple (i.e. read rows between X and Y only) and can be done
> efficiently via row group statistics.  Smart writers could even partition
> rows for a specific base file into their own row group/pages to make the
> filter trivial.
>
> Regular compaction, which rewrites column files into the base file seems
>> more practical.
>
>
> This seems at odds with supporting column families in the future?
>
> Thanks,
> Micah
>
>
> On Tue, Mar 3, 2026 at 11:43 AM Anurag Mantripragada <
> [email protected]> wrote:
>
>> Hi all,
>>
>> Sorry for the delayed response. I was on vacation and catching up. Thanks
>> for the continued discussion on this topic.
>>
>> *Partial updates*: I agree that MoR-style row-level updates offer
>> limited benefits beyond reducing the writing of irrelevant columns. For use
>> cases like updating a subset of users, existing deletion vectors and the
>> new V4 manifest delete vectors should perform well. Gabor’s suggestion for
>> file-level partial updates is a reasonable alternative, even with some
>> write amplification.
>>
>> *Compaction and small files*: If I understand the row ranges idea
>> correctly, packing multiple updates into larger column files would require
>> matching ranges to base files based on predicates, which adds significant
>> planning complexity. Regular compaction, which rewrites column files into
>> the base file seems more practical.
>>
>> *Column families*: While splitting columns into families is useful, the
>> current design is more generic and already supports packing families into
>> column files. Deciding how to group these columns (manually or via an
>> engine) can be addressed in separate follow-up work.
>>
>> *Next steps:*
>>
>>- Gabor and I are developing a POC for metadata changes, focusing on
>>reading and writing column files using Spark for integration. We will 
>> share
>>more details soon.
>>- I will update the doc in preparation for tomorrow's sync.
>>
>>
>> As a reminder we have a sync on column updates upcoming
>>
>> Efficient column updates sync
>> Wednesday, March 4 · 9:00 – 10:00am
>> Time zone: America/Los_Angeles
>> Google Meet joining info
>> Video call link: https://meet.google.com/naf-tvvn-qup
>>
>> ~ Anurag
>>
>> On Wed, Feb 25, 2026 at 1:32 PM Gábor Kaszab 
>> wrote:
>>
>>> Hey All,
>>>
>>> Nice to see the activity on this thread. Thanks to everyone who chimed
>>> in!
>>>
>>> Micah, I also feel that 1) (full column updates) and 2) (partial but
>>> file-level column updates) could be a good middle ground between perf
>>> improvement and keeping the code complexity low. In fact I had the chance
>>> to experiment in this area and the metadata + API part would be as simple
>>> as in th

Re: [Discuss] Efficient column updates in Iceberg

2026-03-03 Thread Micah Kornfield
Hi Anurag,

> *Compaction and small files*: If I understand the row ranges idea
> correctly, packing multiple updates into larger column files would require
> matching ranges to base files based on predicates, which adds significant
> planning complexity. Regular compaction, which rewrites column files into
> the base file seems more practical.


Could you expand on the complexity you think this introduces (or more
specifically "significant" part)? In this case the predicate should be
pretty simple (i.e. read rows between X and Y only) and can be done
efficiently via row group statistics.  Smart writers could even partition
rows for a specific base file into their own row group/pages to make the
filter trivial.

Regular compaction, which rewrites column files into the base file seems
> more practical.


This seems at odds with supporting column families in the future?

Thanks,
Micah


On Tue, Mar 3, 2026 at 11:43 AM Anurag Mantripragada <
[email protected]> wrote:

> Hi all,
>
> Sorry for the delayed response. I was on vacation and catching up. Thanks
> for the continued discussion on this topic.
>
> *Partial updates*: I agree that MoR-style row-level updates offer limited
> benefits beyond reducing the writing of irrelevant columns. For use cases
> like updating a subset of users, existing deletion vectors and the new V4
> manifest delete vectors should perform well. Gabor’s suggestion for
> file-level partial updates is a reasonable alternative, even with some
> write amplification.
>
> *Compaction and small files*: If I understand the row ranges idea
> correctly, packing multiple updates into larger column files would require
> matching ranges to base files based on predicates, which adds significant
> planning complexity. Regular compaction, which rewrites column files into
> the base file seems more practical.
>
> *Column families*: While splitting columns into families is useful, the
> current design is more generic and already supports packing families into
> column files. Deciding how to group these columns (manually or via an
> engine) can be addressed in separate follow-up work.
>
> *Next steps:*
>
>- Gabor and I are developing a POC for metadata changes, focusing on
>reading and writing column files using Spark for integration. We will share
>more details soon.
>- I will update the doc in preparation for tomorrow's sync.
>
>
> As a reminder we have a sync on column updates upcoming
>
> Efficient column updates sync
> Wednesday, March 4 · 9:00 – 10:00am
> Time zone: America/Los_Angeles
> Google Meet joining info
> Video call link: https://meet.google.com/naf-tvvn-qup
>
> ~ Anurag
>
> On Wed, Feb 25, 2026 at 1:32 PM Gábor Kaszab 
> wrote:
>
>> Hey All,
>>
>> Nice to see the activity on this thread. Thanks to everyone who chimed in!
>>
>> Micah, I also feel that 1) (full column updates) and 2) (partial but
>> file-level column updates) could be a good middle ground between perf
>> improvement and keeping the code complexity low. In fact I had the chance
>> to experiment in this area and the metadata + API part would be as simple
>> as in this PoC . Just a
>> side note for 3), from the SQL aspect I'm a bit hesitant how
>> straightforward it is for the users to write predicates that align with
>> file boundaries, though.
>> For deciding on partial column updates, we probably can't get away
>> without doing some measurements of how it compares to existing MoR. I have
>> it on my roadmap, so I'll share it once I have something.
>>
>> Wrapping multiple update files into one is an interesting idea. Let's
>> bring this up on the next sync! Additionally, full column updates could add
>> a huge overhead on the metadata files being created too (delete everything
>> + write everything with updates), unless we decide to do some manifest
>> rewrites/optimizations under the hood during the commit.
>>
>> Peter, column families as a schema-like table metadata level information
>> would definitely be useful. It seems like a natural follow-up of the column
>> update work, but we have to keep in mind to choose a design that won't
>> prevent us from implementing a more general column families concept
>> (probably for inserts too).
>>
>> Best Regards,
>> Gabor
>>
>> Micah Kornfield  ezt írta (időpont: 2026. febr.
>> 21., Szo, 17:53):
>>
>>> 1) and 3) are what I was thinking of as use-cases.  I agree unless there
>>> is a strong motivating use-case for MoR style column updates we should try
>>> to avoid this complexity and use the existing row based MoR.
>>>
>>> One other idea I was trying to think through is the "small file problem"
>>> we would likely encounter for single column additions/updates for fixed
>>> width data.  Would it make sense to add a record-range into the metadata
>>> for column families, so that we can pack column updates across files into
>>> reasonably sized files (similar to what we do for DVs today in puffin
>>> files)?
>

Re: [Discuss] Efficient column updates in Iceberg

2026-03-03 Thread Anurag Mantripragada
Hi all,

Sorry for the delayed response. I was on vacation and catching up. Thanks
for the continued discussion on this topic.

*Partial updates*: I agree that MoR-style row-level updates offer limited
benefits beyond reducing the writing of irrelevant columns. For use cases
like updating a subset of users, existing deletion vectors and the new V4
manifest delete vectors should perform well. Gabor’s suggestion for
file-level partial updates is a reasonable alternative, even with some
write amplification.

*Compaction and small files*: If I understand the row ranges idea
correctly, packing multiple updates into larger column files would require
matching ranges to base files based on predicates, which adds significant
planning complexity. Regular compaction, which rewrites column files into
the base file seems more practical.

*Column families*: While splitting columns into families is useful, the
current design is more generic and already supports packing families into
column files. Deciding how to group these columns (manually or via an
engine) can be addressed in separate follow-up work.

*Next steps:*

   - Gabor and I are developing a POC for metadata changes, focusing on
   reading and writing column files using Spark for integration. We will share
   more details soon.
   - I will update the doc in preparation for tomorrow's sync.


As a reminder we have a sync on column updates upcoming

Efficient column updates sync
Wednesday, March 4 · 9:00 – 10:00am
Time zone: America/Los_Angeles
Google Meet joining info
Video call link: https://meet.google.com/naf-tvvn-qup

~ Anurag

On Wed, Feb 25, 2026 at 1:32 PM Gábor Kaszab  wrote:

> Hey All,
>
> Nice to see the activity on this thread. Thanks to everyone who chimed in!
>
> Micah, I also feel that 1) (full column updates) and 2) (partial but
> file-level column updates) could be a good middle ground between perf
> improvement and keeping the code complexity low. In fact I had the chance
> to experiment in this area and the metadata + API part would be as simple
> as in this PoC . Just a
> side note for 3), from the SQL aspect I'm a bit hesitant how
> straightforward it is for the users to write predicates that align with
> file boundaries, though.
> For deciding on partial column updates, we probably can't get away without
> doing some measurements of how it compares to existing MoR. I have it on my
> roadmap, so I'll share it once I have something.
>
> Wrapping multiple update files into one is an interesting idea. Let's
> bring this up on the next sync! Additionally, full column updates could add
> a huge overhead on the metadata files being created too (delete everything
> + write everything with updates), unless we decide to do some manifest
> rewrites/optimizations under the hood during the commit.
>
> Peter, column families as a schema-like table metadata level information
> would definitely be useful. It seems like a natural follow-up of the column
> update work, but we have to keep in mind to choose a design that won't
> prevent us from implementing a more general column families concept
> (probably for inserts too).
>
> Best Regards,
> Gabor
>
> Micah Kornfield  ezt írta (időpont: 2026. febr.
> 21., Szo, 17:53):
>
>> 1) and 3) are what I was thinking of as use-cases.  I agree unless there
>> is a strong motivating use-case for MoR style column updates we should try
>> to avoid this complexity and use the existing row based MoR.
>>
>> One other idea I was trying to think through is the "small file problem"
>> we would likely encounter for single column additions/updates for fixed
>> width data.  Would it make sense to add a record-range into the metadata
>> for column families, so that we can pack column updates across files into
>> reasonably sized files (similar to what we do for DVs today in puffin
>> files)?
>>
>> Thanks,
>> Micah
>>
>> On Mon, Feb 16, 2026 at 7:23 AM Gábor Kaszab 
>> wrote:
>>
>>> Hey All,
>>>
>>> Thanks Anurag for the summary!
>>>
>>> I regret we don't have a recording for the sync, but I had the
>>> impression that, even though there was a lengthy discussion about the
>>> implementation requirements for partial updates, there wasn't a strong
>>> consensus around the need and there were no strong use cases to justify
>>> partial updates either. Let me sum up where I see we are at now:
>>>
>>> *Scope of the updates*
>>>
>>> *1) Full column updates*
>>> There is a consensus and common understanding that this use case makes
>>> sense. If this was the only supported use-case, the implementation would be
>>> relatively simple. We could guarantee there is no overlap in column updates
>>> by deduplicating the field IDs in the column update metadata. E.g. Let's
>>> say we have a column update on columns {1,2} and we write another column
>>> update for {2,3}: we can change the metadata for the first one to only
>>> cover {1} and not {1,2}. With this the write and the read/stitching process
>

Re: [Discuss] Efficient column updates in Iceberg

2026-02-25 Thread Gábor Kaszab
Hey All,

Nice to see the activity on this thread. Thanks to everyone who chimed in!

Micah, I also feel that 1) (full column updates) and 2) (partial but
file-level column updates) could be a good middle ground between perf
improvement and keeping the code complexity low. In fact I had the chance
to experiment in this area and the metadata + API part would be as simple
as in this PoC . Just a side
note for 3), from the SQL aspect I'm a bit hesitant how straightforward it
is for the users to write predicates that align with file boundaries,
though.
For deciding on partial column updates, we probably can't get away without
doing some measurements of how it compares to existing MoR. I have it on my
roadmap, so I'll share it once I have something.

Wrapping multiple update files into one is an interesting idea. Let's bring
this up on the next sync! Additionally, full column updates could add a
huge overhead on the metadata files being created too (delete everything +
write everything with updates), unless we decide to do some manifest
rewrites/optimizations under the hood during the commit.

Peter, column families as a schema-like table metadata level information
would definitely be useful. It seems like a natural follow-up of the column
update work, but we have to keep in mind to choose a design that won't
prevent us from implementing a more general column families concept
(probably for inserts too).

Best Regards,
Gabor

Micah Kornfield  ezt írta (időpont: 2026. febr. 21.,
Szo, 17:53):

> 1) and 3) are what I was thinking of as use-cases.  I agree unless there
> is a strong motivating use-case for MoR style column updates we should try
> to avoid this complexity and use the existing row based MoR.
>
> One other idea I was trying to think through is the "small file problem"
> we would likely encounter for single column additions/updates for fixed
> width data.  Would it make sense to add a record-range into the metadata
> for column families, so that we can pack column updates across files into
> reasonably sized files (similar to what we do for DVs today in puffin
> files)?
>
> Thanks,
> Micah
>
> On Mon, Feb 16, 2026 at 7:23 AM Gábor Kaszab 
> wrote:
>
>> Hey All,
>>
>> Thanks Anurag for the summary!
>>
>> I regret we don't have a recording for the sync, but I had the impression
>> that, even though there was a lengthy discussion about the implementation
>> requirements for partial updates, there wasn't a strong consensus around
>> the need and there were no strong use cases to justify partial updates
>> either. Let me sum up where I see we are at now:
>>
>> *Scope of the updates*
>>
>> *1) Full column updates*
>> There is a consensus and common understanding that this use case makes
>> sense. If this was the only supported use-case, the implementation would be
>> relatively simple. We could guarantee there is no overlap in column updates
>> by deduplicating the field IDs in the column update metadata. E.g. Let's
>> say we have a column update on columns {1,2} and we write another column
>> update for {2,3}: we can change the metadata for the first one to only
>> cover {1} and not {1,2}. With this the write and the read/stitching process
>> is also straightforward (if we decide not to support equality deletes
>> together with column updates).
>>
>> Both row matching approaches could work here:
>> - row number matching update files, where we fill the deleted rows
>> with an arbitrary value (preferably null)
>> - sparse update files with some auxiliary column written into the
>> column update file, like row position in base file
>>
>> *2) Partial column updates (row-level)*
>> I see 2 use cases mentioned for this: bug-fixing a subset of rows,
>> updating features for active users
>> My initial impression here is that whether to use column updates or not
>> heavily depends on the selectivity of the partial update queries. I'm sure
>> there is a percentage of the affected rows where if we go below it's simply
>> better to use the traditional row level updates (cow/mor). I'm not entirely
>> convinced that covering these scenarios is worth the extra complexity here:
>> - We can't deduplicate the column updates by field IDs on the
>> metadata-side
>> - We have two options for writers:
>>  - Merge the existing column update files themselves when writing
>> a new one with an overlap of field Ids. No need to sort out the different
>> column updates files and merge them on the read side, but there is overhead
>> on write side
>> - Don't bother merging existing column updates when writing a new
>> one. This makes overhead on the read side.
>>
>> Handling of sparse update files is a must here, with the chance for
>> optimisation if all the rows are covered with the update file, as Micah
>> suggested.
>>
>> To sum up, I think to justify this approach we need to have strong
>> use-cases and measurements to verify that the extra complexity resul

Re: [Discuss] Efficient column updates in Iceberg

2026-02-21 Thread Micah Kornfield
1) and 3) are what I was thinking of as use-cases.  I agree unless there is
a strong motivating use-case for MoR style column updates we should try to
avoid this complexity and use the existing row based MoR.

One other idea I was trying to think through is the "small file problem" we
would likely encounter for single column additions/updates for fixed width
data.  Would it make sense to add a record-range into the metadata for
column families, so that we can pack column updates across files into
reasonably sized files (similar to what we do for DVs today in puffin
files)?

Thanks,
Micah

On Mon, Feb 16, 2026 at 7:23 AM Gábor Kaszab  wrote:

> Hey All,
>
> Thanks Anurag for the summary!
>
> I regret we don't have a recording for the sync, but I had the impression
> that, even though there was a lengthy discussion about the implementation
> requirements for partial updates, there wasn't a strong consensus around
> the need and there were no strong use cases to justify partial updates
> either. Let me sum up where I see we are at now:
>
> *Scope of the updates*
>
> *1) Full column updates*
> There is a consensus and common understanding that this use case makes
> sense. If this was the only supported use-case, the implementation would be
> relatively simple. We could guarantee there is no overlap in column updates
> by deduplicating the field IDs in the column update metadata. E.g. Let's
> say we have a column update on columns {1,2} and we write another column
> update for {2,3}: we can change the metadata for the first one to only
> cover {1} and not {1,2}. With this the write and the read/stitching process
> is also straightforward (if we decide not to support equality deletes
> together with column updates).
>
> Both row matching approaches could work here:
> - row number matching update files, where we fill the deleted rows
> with an arbitrary value (preferably null)
> - sparse update files with some auxiliary column written into the
> column update file, like row position in base file
>
> *2) Partial column updates (row-level)*
> I see 2 use cases mentioned for this: bug-fixing a subset of rows,
> updating features for active users
> My initial impression here is that whether to use column updates or not
> heavily depends on the selectivity of the partial update queries. I'm sure
> there is a percentage of the affected rows where if we go below it's simply
> better to use the traditional row level updates (cow/mor). I'm not entirely
> convinced that covering these scenarios is worth the extra complexity here:
> - We can't deduplicate the column updates by field IDs on the
> metadata-side
> - We have two options for writers:
>  - Merge the existing column update files themselves when writing
> a new one with an overlap of field Ids. No need to sort out the different
> column updates files and merge them on the read side, but there is overhead
> on write side
> - Don't bother merging existing column updates when writing a new
> one. This makes overhead on the read side.
>
> Handling of sparse update files is a must here, with the chance for
> optimisation if all the rows are covered with the update file, as Micah
> suggested.
>
> To sum up, I think to justify this approach we need to have strong
> use-cases and measurements to verify that the extra complexity results
> convincingly better results compared to existing CoW/MoR approaches.
>
> *3) Partial column updates (file-level)*
> This option wasn't brought up during our conversation but might be worth
> considering. This is basically a middleground between the above two
> approaches. Partial updates are allowed as long as they affect entire data
> files, and it's allowed to only cover a subset of the files. One use-case
> would be to do column updates per partition for instance.
>
> With this approach the metadata representation could be as simple as in
> 1), where we can deduplicate the updates files by field IDs. Also there is
> no write and read overhead on top of 1) apart from the verification step to
> ensure that the WHERE filter on the update is doing the split on file
> boundaries.
> Also similarly to 1), sparse update files weren't a must here, we could
> consider row-matching update files too.
>
> *Row alignment*
> Sparse update files are required for row-level partial updates, but if we
> decide to go with any of the other options we could also evaluate the "row
> count matching" approach too. Even though it requires filling the missing
> rows with arbitrary values (null seems a good candidate) it would result in
> less write overhead (no need to write row position) and read overhead (no
> need to join rows by row position) too that could worth the inconvenience
> of having 'invalid' but inaccessible values in the files. The num nulls
> stats being off is a good argument against this, but I think we could have
> a way of fixing this too by keeping track of how many rows were deleted
> (and subtract this value

Re: [Discuss] Efficient column updates in Iceberg

2026-02-20 Thread Frank Bertsch
I'm in favor of the column families. I have dealt with tables of tens of
thousands of columns, and we had a very good understanding of which fields
should have been grouped together. Allowing engineers to choose these
families could be a benefit for these use cases.

For example, sometimes we want to pull many different sources together into
one table. Each source will have their own schema elements, and some shared
elements. Usually users will be querying the shared elements, or a specific
source and the fields from that source. This would be a great use case for
families.

Supporting those use cases and efficient column additions or updates would
be extremely helpful!

On Fri, Feb 20, 2026, 1:25 PM Steven Wu  wrote:

> When and how to compact column files should probably be an implementation
> decision.
>
> Peter's suggestion of column family concept is an interesting idea. We
> haven't had such a concept before. I am wondering if we should omit it from
> the spec and let engines decide how to group columns. If we are talking
> about wide tables with thousands of columns, manually grouping via column
> families seems difficult for users.
>
> On Fri, Feb 20, 2026 at 1:28 AM Péter Váry 
> wrote:
>
>> In some scenarios, keeping files vertically split can be
>> advantageous,especially for tables with many columns that have very
>> different characteristics. For example, a table might contain numerous
>> boolean or int/long feature columns alongside large binary blobs, text
>> fields, or even image data. Storing these groups of columns in separate
>> Parquet files can improve both encoding efficiency and query performance.
>> We could introduce a sort‑order-like mechanism that defines the desired
>> column layout for the table, and let compaction jobs enforce the
>> appropriate column‑family structure when files are compacted.
>>
>> Engines would remain free to merge column files or perform full
>> copy‑on‑write rewrites when wide updates occur. However, I would avoid
>> adding extra complexity by trying to support this directly in the commit or
>> write paths, especially since the value of compaction varies significantly
>> across different datasets and use cases.
>>
>> Shawn Chang  ezt írta (időpont: 2026. febr. 17.,
>> K, 2:44):
>>
>>> Hi all,
>>>
>>> Just got a chance to follow up on the discussion here. Making column
>>> files additive to the existing base files seems reasonable to me, but I
>>> think it also implies that compaction is a must, similar to how we manage
>>> delete files today. An important difference is that updates usually occur
>>> much more frequently than deletes.
>>>
>>> This may be a separate concern, but have we considered whether
>>> compaction should be more closely tied to writes? For example, triggering a
>>> rewrite once we have X number of column files, rather than relying solely
>>> on an independant compaction job. There can be minor compactions to just
>>> collapse one file set (base file + column files) so we don't block writers
>>> too much.
>>>
>>> Best,
>>> Shawn
>>>
>>> On Mon, Feb 16, 2026 at 7:23 AM Gábor Kaszab 
>>> wrote:
>>>
 Hey All,

 Thanks Anurag for the summary!

 I regret we don't have a recording for the sync, but I had the
 impression that, even though there was a lengthy discussion about the
 implementation requirements for partial updates, there wasn't a strong
 consensus around the need and there were no strong use cases to justify
 partial updates either. Let me sum up where I see we are at now:

 *Scope of the updates*

 *1) Full column updates*
 There is a consensus and common understanding that this use case makes
 sense. If this was the only supported use-case, the implementation would be
 relatively simple. We could guarantee there is no overlap in column updates
 by deduplicating the field IDs in the column update metadata. E.g. Let's
 say we have a column update on columns {1,2} and we write another column
 update for {2,3}: we can change the metadata for the first one to only
 cover {1} and not {1,2}. With this the write and the read/stitching process
 is also straightforward (if we decide not to support equality deletes
 together with column updates).

 Both row matching approaches could work here:
 - row number matching update files, where we fill the deleted rows
 with an arbitrary value (preferably null)
 - sparse update files with some auxiliary column written into the
 column update file, like row position in base file

 *2) Partial column updates (row-level)*
 I see 2 use cases mentioned for this: bug-fixing a subset of rows,
 updating features for active users
 My initial impression here is that whether to use column updates or not
 heavily depends on the selectivity of the partial update queries. I'm sure
 there is a percentage of the affected rows where if we go below it's simply
 bett

Re: [Discuss] Efficient column updates in Iceberg

2026-02-20 Thread Steven Wu
When and how to compact column files should probably be an implementation
decision.

Peter's suggestion of column family concept is an interesting idea. We
haven't had such a concept before. I am wondering if we should omit it from
the spec and let engines decide how to group columns. If we are talking
about wide tables with thousands of columns, manually grouping via column
families seems difficult for users.

On Fri, Feb 20, 2026 at 1:28 AM Péter Váry 
wrote:

> In some scenarios, keeping files vertically split can be
> advantageous,especially for tables with many columns that have very
> different characteristics. For example, a table might contain numerous
> boolean or int/long feature columns alongside large binary blobs, text
> fields, or even image data. Storing these groups of columns in separate
> Parquet files can improve both encoding efficiency and query performance.
> We could introduce a sort‑order-like mechanism that defines the desired
> column layout for the table, and let compaction jobs enforce the
> appropriate column‑family structure when files are compacted.
>
> Engines would remain free to merge column files or perform full
> copy‑on‑write rewrites when wide updates occur. However, I would avoid
> adding extra complexity by trying to support this directly in the commit or
> write paths, especially since the value of compaction varies significantly
> across different datasets and use cases.
>
> Shawn Chang  ezt írta (időpont: 2026. febr. 17.,
> K, 2:44):
>
>> Hi all,
>>
>> Just got a chance to follow up on the discussion here. Making column
>> files additive to the existing base files seems reasonable to me, but I
>> think it also implies that compaction is a must, similar to how we manage
>> delete files today. An important difference is that updates usually occur
>> much more frequently than deletes.
>>
>> This may be a separate concern, but have we considered whether compaction
>> should be more closely tied to writes? For example, triggering a rewrite
>> once we have X number of column files, rather than relying solely on an
>> independant compaction job. There can be minor compactions to just collapse
>> one file set (base file + column files) so we don't block writers too much.
>>
>> Best,
>> Shawn
>>
>> On Mon, Feb 16, 2026 at 7:23 AM Gábor Kaszab 
>> wrote:
>>
>>> Hey All,
>>>
>>> Thanks Anurag for the summary!
>>>
>>> I regret we don't have a recording for the sync, but I had the
>>> impression that, even though there was a lengthy discussion about the
>>> implementation requirements for partial updates, there wasn't a strong
>>> consensus around the need and there were no strong use cases to justify
>>> partial updates either. Let me sum up where I see we are at now:
>>>
>>> *Scope of the updates*
>>>
>>> *1) Full column updates*
>>> There is a consensus and common understanding that this use case makes
>>> sense. If this was the only supported use-case, the implementation would be
>>> relatively simple. We could guarantee there is no overlap in column updates
>>> by deduplicating the field IDs in the column update metadata. E.g. Let's
>>> say we have a column update on columns {1,2} and we write another column
>>> update for {2,3}: we can change the metadata for the first one to only
>>> cover {1} and not {1,2}. With this the write and the read/stitching process
>>> is also straightforward (if we decide not to support equality deletes
>>> together with column updates).
>>>
>>> Both row matching approaches could work here:
>>> - row number matching update files, where we fill the deleted rows
>>> with an arbitrary value (preferably null)
>>> - sparse update files with some auxiliary column written into the
>>> column update file, like row position in base file
>>>
>>> *2) Partial column updates (row-level)*
>>> I see 2 use cases mentioned for this: bug-fixing a subset of rows,
>>> updating features for active users
>>> My initial impression here is that whether to use column updates or not
>>> heavily depends on the selectivity of the partial update queries. I'm sure
>>> there is a percentage of the affected rows where if we go below it's simply
>>> better to use the traditional row level updates (cow/mor). I'm not entirely
>>> convinced that covering these scenarios is worth the extra complexity here:
>>> - We can't deduplicate the column updates by field IDs on the
>>> metadata-side
>>> - We have two options for writers:
>>>  - Merge the existing column update files themselves when
>>> writing a new one with an overlap of field Ids. No need to sort out the
>>> different column updates files and merge them on the read side, but there
>>> is overhead on write side
>>> - Don't bother merging existing column updates when writing a
>>> new one. This makes overhead on the read side.
>>>
>>> Handling of sparse update files is a must here, with the chance for
>>> optimisation if all the rows are covered with the update file, as Micah
>>> suggested.

Re: [Discuss] Efficient column updates in Iceberg

2026-02-20 Thread Péter Váry
In some scenarios, keeping files vertically split can be
advantageous,especially for tables with many columns that have very
different characteristics. For example, a table might contain numerous
boolean or int/long feature columns alongside large binary blobs, text
fields, or even image data. Storing these groups of columns in separate
Parquet files can improve both encoding efficiency and query performance.
We could introduce a sort‑order-like mechanism that defines the desired
column layout for the table, and let compaction jobs enforce the
appropriate column‑family structure when files are compacted.

Engines would remain free to merge column files or perform full
copy‑on‑write rewrites when wide updates occur. However, I would avoid
adding extra complexity by trying to support this directly in the commit or
write paths, especially since the value of compaction varies significantly
across different datasets and use cases.

Shawn Chang  ezt írta (időpont: 2026. febr. 17., K,
2:44):

> Hi all,
>
> Just got a chance to follow up on the discussion here. Making column files
> additive to the existing base files seems reasonable to me, but I think it
> also implies that compaction is a must, similar to how we manage delete
> files today. An important difference is that updates usually occur much
> more frequently than deletes.
>
> This may be a separate concern, but have we considered whether compaction
> should be more closely tied to writes? For example, triggering a rewrite
> once we have X number of column files, rather than relying solely on an
> independant compaction job. There can be minor compactions to just collapse
> one file set (base file + column files) so we don't block writers too much.
>
> Best,
> Shawn
>
> On Mon, Feb 16, 2026 at 7:23 AM Gábor Kaszab 
> wrote:
>
>> Hey All,
>>
>> Thanks Anurag for the summary!
>>
>> I regret we don't have a recording for the sync, but I had the impression
>> that, even though there was a lengthy discussion about the implementation
>> requirements for partial updates, there wasn't a strong consensus around
>> the need and there were no strong use cases to justify partial updates
>> either. Let me sum up where I see we are at now:
>>
>> *Scope of the updates*
>>
>> *1) Full column updates*
>> There is a consensus and common understanding that this use case makes
>> sense. If this was the only supported use-case, the implementation would be
>> relatively simple. We could guarantee there is no overlap in column updates
>> by deduplicating the field IDs in the column update metadata. E.g. Let's
>> say we have a column update on columns {1,2} and we write another column
>> update for {2,3}: we can change the metadata for the first one to only
>> cover {1} and not {1,2}. With this the write and the read/stitching process
>> is also straightforward (if we decide not to support equality deletes
>> together with column updates).
>>
>> Both row matching approaches could work here:
>> - row number matching update files, where we fill the deleted rows
>> with an arbitrary value (preferably null)
>> - sparse update files with some auxiliary column written into the
>> column update file, like row position in base file
>>
>> *2) Partial column updates (row-level)*
>> I see 2 use cases mentioned for this: bug-fixing a subset of rows,
>> updating features for active users
>> My initial impression here is that whether to use column updates or not
>> heavily depends on the selectivity of the partial update queries. I'm sure
>> there is a percentage of the affected rows where if we go below it's simply
>> better to use the traditional row level updates (cow/mor). I'm not entirely
>> convinced that covering these scenarios is worth the extra complexity here:
>> - We can't deduplicate the column updates by field IDs on the
>> metadata-side
>> - We have two options for writers:
>>  - Merge the existing column update files themselves when writing
>> a new one with an overlap of field Ids. No need to sort out the different
>> column updates files and merge them on the read side, but there is overhead
>> on write side
>> - Don't bother merging existing column updates when writing a new
>> one. This makes overhead on the read side.
>>
>> Handling of sparse update files is a must here, with the chance for
>> optimisation if all the rows are covered with the update file, as Micah
>> suggested.
>>
>> To sum up, I think to justify this approach we need to have strong
>> use-cases and measurements to verify that the extra complexity results
>> convincingly better results compared to existing CoW/MoR approaches.
>>
>> *3) Partial column updates (file-level)*
>> This option wasn't brought up during our conversation but might be worth
>> considering. This is basically a middleground between the above two
>> approaches. Partial updates are allowed as long as they affect entire data
>> files, and it's allowed to only cover a subset of the files. One use-case
>> w

Re: [Discuss] Efficient column updates in Iceberg

2026-02-16 Thread Shawn Chang
Hi all,

Just got a chance to follow up on the discussion here. Making column files
additive to the existing base files seems reasonable to me, but I think it
also implies that compaction is a must, similar to how we manage delete
files today. An important difference is that updates usually occur much
more frequently than deletes.

This may be a separate concern, but have we considered whether compaction
should be more closely tied to writes? For example, triggering a rewrite
once we have X number of column files, rather than relying solely on an
independant compaction job. There can be minor compactions to just collapse
one file set (base file + column files) so we don't block writers too much.

Best,
Shawn

On Mon, Feb 16, 2026 at 7:23 AM Gábor Kaszab  wrote:

> Hey All,
>
> Thanks Anurag for the summary!
>
> I regret we don't have a recording for the sync, but I had the impression
> that, even though there was a lengthy discussion about the implementation
> requirements for partial updates, there wasn't a strong consensus around
> the need and there were no strong use cases to justify partial updates
> either. Let me sum up where I see we are at now:
>
> *Scope of the updates*
>
> *1) Full column updates*
> There is a consensus and common understanding that this use case makes
> sense. If this was the only supported use-case, the implementation would be
> relatively simple. We could guarantee there is no overlap in column updates
> by deduplicating the field IDs in the column update metadata. E.g. Let's
> say we have a column update on columns {1,2} and we write another column
> update for {2,3}: we can change the metadata for the first one to only
> cover {1} and not {1,2}. With this the write and the read/stitching process
> is also straightforward (if we decide not to support equality deletes
> together with column updates).
>
> Both row matching approaches could work here:
> - row number matching update files, where we fill the deleted rows
> with an arbitrary value (preferably null)
> - sparse update files with some auxiliary column written into the
> column update file, like row position in base file
>
> *2) Partial column updates (row-level)*
> I see 2 use cases mentioned for this: bug-fixing a subset of rows,
> updating features for active users
> My initial impression here is that whether to use column updates or not
> heavily depends on the selectivity of the partial update queries. I'm sure
> there is a percentage of the affected rows where if we go below it's simply
> better to use the traditional row level updates (cow/mor). I'm not entirely
> convinced that covering these scenarios is worth the extra complexity here:
> - We can't deduplicate the column updates by field IDs on the
> metadata-side
> - We have two options for writers:
>  - Merge the existing column update files themselves when writing
> a new one with an overlap of field Ids. No need to sort out the different
> column updates files and merge them on the read side, but there is overhead
> on write side
> - Don't bother merging existing column updates when writing a new
> one. This makes overhead on the read side.
>
> Handling of sparse update files is a must here, with the chance for
> optimisation if all the rows are covered with the update file, as Micah
> suggested.
>
> To sum up, I think to justify this approach we need to have strong
> use-cases and measurements to verify that the extra complexity results
> convincingly better results compared to existing CoW/MoR approaches.
>
> *3) Partial column updates (file-level)*
> This option wasn't brought up during our conversation but might be worth
> considering. This is basically a middleground between the above two
> approaches. Partial updates are allowed as long as they affect entire data
> files, and it's allowed to only cover a subset of the files. One use-case
> would be to do column updates per partition for instance.
>
> With this approach the metadata representation could be as simple as in
> 1), where we can deduplicate the updates files by field IDs. Also there is
> no write and read overhead on top of 1) apart from the verification step to
> ensure that the WHERE filter on the update is doing the split on file
> boundaries.
> Also similarly to 1), sparse update files weren't a must here, we could
> consider row-matching update files too.
>
> *Row alignment*
> Sparse update files are required for row-level partial updates, but if we
> decide to go with any of the other options we could also evaluate the "row
> count matching" approach too. Even though it requires filling the missing
> rows with arbitrary values (null seems a good candidate) it would result in
> less write overhead (no need to write row position) and read overhead (no
> need to join rows by row position) too that could worth the inconvenience
> of having 'invalid' but inaccessible values in the files. The num nulls
> stats being off is a good argument against this, but

Re: [Discuss] Efficient column updates in Iceberg

2026-02-16 Thread Gábor Kaszab
Hey All,

Thanks Anurag for the summary!

I regret we don't have a recording for the sync, but I had the impression
that, even though there was a lengthy discussion about the implementation
requirements for partial updates, there wasn't a strong consensus around
the need and there were no strong use cases to justify partial updates
either. Let me sum up where I see we are at now:

*Scope of the updates*

*1) Full column updates*
There is a consensus and common understanding that this use case makes
sense. If this was the only supported use-case, the implementation would be
relatively simple. We could guarantee there is no overlap in column updates
by deduplicating the field IDs in the column update metadata. E.g. Let's
say we have a column update on columns {1,2} and we write another column
update for {2,3}: we can change the metadata for the first one to only
cover {1} and not {1,2}. With this the write and the read/stitching process
is also straightforward (if we decide not to support equality deletes
together with column updates).

Both row matching approaches could work here:
- row number matching update files, where we fill the deleted rows with
an arbitrary value (preferably null)
- sparse update files with some auxiliary column written into the
column update file, like row position in base file

*2) Partial column updates (row-level)*
I see 2 use cases mentioned for this: bug-fixing a subset of rows, updating
features for active users
My initial impression here is that whether to use column updates or not
heavily depends on the selectivity of the partial update queries. I'm sure
there is a percentage of the affected rows where if we go below it's simply
better to use the traditional row level updates (cow/mor). I'm not entirely
convinced that covering these scenarios is worth the extra complexity here:
- We can't deduplicate the column updates by field IDs on the
metadata-side
- We have two options for writers:
 - Merge the existing column update files themselves when writing a
new one with an overlap of field Ids. No need to sort out the different
column updates files and merge them on the read side, but there is overhead
on write side
- Don't bother merging existing column updates when writing a new
one. This makes overhead on the read side.

Handling of sparse update files is a must here, with the chance for
optimisation if all the rows are covered with the update file, as Micah
suggested.

To sum up, I think to justify this approach we need to have strong
use-cases and measurements to verify that the extra complexity results
convincingly better results compared to existing CoW/MoR approaches.

*3) Partial column updates (file-level)*
This option wasn't brought up during our conversation but might be worth
considering. This is basically a middleground between the above two
approaches. Partial updates are allowed as long as they affect entire data
files, and it's allowed to only cover a subset of the files. One use-case
would be to do column updates per partition for instance.

With this approach the metadata representation could be as simple as in 1),
where we can deduplicate the updates files by field IDs. Also there is no
write and read overhead on top of 1) apart from the verification step to
ensure that the WHERE filter on the update is doing the split on file
boundaries.
Also similarly to 1), sparse update files weren't a must here, we could
consider row-matching update files too.

*Row alignment*
Sparse update files are required for row-level partial updates, but if we
decide to go with any of the other options we could also evaluate the "row
count matching" approach too. Even though it requires filling the missing
rows with arbitrary values (null seems a good candidate) it would result in
less write overhead (no need to write row position) and read overhead (no
need to join rows by row position) too that could worth the inconvenience
of having 'invalid' but inaccessible values in the files. The num nulls
stats being off is a good argument against this, but I think we could have
a way of fixing this too by keeping track of how many rows were deleted
(and subtract this value from the num nulls counter returned by the writer).


*Next steps*
I'm actively working on a very basic PoC implementation where we would be
able to test the different approaches comparing pros and cons so that we
can make a decision on the above questions. I'll sync with Anurag on this
and will let you know once we have something.

Best Regards,
Gabor


Micah Kornfield  ezt írta (időpont: 2026. febr. 14.,
Szo, 2:20):

> Given that, the sparse representation with alignment at read time (using
>> dummy/null values) seems to provide the benefits of both efficient
>> vectorized reads and stitching as well as support for partial column
>> updates. Would you agree?
>
>
> Thinking more about it, I think the sparse approach is actually a superset
> set approach, so it is not a concern.  If writer

Re: [Discuss] Efficient column updates in Iceberg

2026-02-13 Thread Micah Kornfield
>
> Given that, the sparse representation with alignment at read time (using
> dummy/null values) seems to provide the benefits of both efficient
> vectorized reads and stitching as well as support for partial column
> updates. Would you agree?


Thinking more about it, I think the sparse approach is actually a superset
set approach, so it is not a concern.  If writers want they can write out
the fully populated columns with position indexes from 1 to N, and readers
can take an optimized path if they detect the number of rows in the update
is equal to the number of base rows.

I still think there is a question on what writers should do (i.e. when do
they decide to duplicate data instead of trying to give sparse updates) but
that is an implementation question and not necessarily something that needs
to block spec work.

Cheers,
Micah

On Fri, Feb 13, 2026 at 11:29 AM Anurag Mantripragada <
[email protected]> wrote:

> Hi Micah,
>
> This seems like a classic MoR vs CoW trade-off.  But it seems like maybe
>> both sparse and full should be available (I understand this adds
>> complexity). For adding a new column or completely updating a new column,
>> the performance would be better to prefill the data
>
>
> Our internal use cases are very similar to what you describe. We primarily
> deal with full column updates. However, the feedback on the proposal from
> the wider community indicated that partial updates (e.g., bug-fixing a
> subset of rows, updating features for active users) are also a very common
> and critical use case.
>
> Is there evidence to say that partial column updates are more common in
>> practice then full rewrites?
>
>
> Personally, I don't have hard data on which use case is more common in the
> wild, only that both appear to be important. I also agree that a good long
> term solution should support both strategies. Given that, the sparse
> representation with alignment at read time (using dummy/null values) seems
> to provide the benefits of both efficient vectorized reads and stitching as
> well as support for partial column updates. Would you agree?
>
> ~ Anurag
>
> On Fri, Feb 13, 2026 at 9:33 AM Micah Kornfield 
> wrote:
>
>> Hi Anurag,
>>
>>> Data Representation: Sparse column files are preferred for compact
>>> representation and are better suited for partial column updates. We can
>>> optimize sparse representation for vectorized reads by filling in null
>>> or default values at read time for missing positions from the base file,
>>> which avoids joins during reads.
>>
>>
>> This seems like a classic MoR vs CoW trade-off.  But it seems like maybe
>> both sparse and full should be available (I understand this adds
>> complexity).  For adding a new column or completely updating a new column,
>> the performance would be better to prefill the data (otherwise one ends up
>> duplicating the work that is already happening under the hood in parquet).
>>
>> Is there evidence to say that partial column updates are more common in
>> practice then full rewrites?
>>
>> Thanks,
>> Micah
>>
>>
>> On Thu, Feb 12, 2026 at 3:32 AM Eduard Tudenhöfner <
>> [email protected]> wrote:
>>
>>> Hey Anurag,
>>>
>>> I wasn't able to make it to the sync but was hoping to watch the
>>> recording afterwards.
>>> I'm curious what the reasons were for discarding the Parquet-native
>>> approach. Could you share a summary from what was discussed in the sync
>>> please on that topic?
>>>
>>> On Tue, Feb 10, 2026 at 8:20 PM Anurag Mantripragada <
>>> [email protected]> wrote:
>>>
 Hi all,

 Thank you for attending today's sync. Please find the meeting notes
 below. I apologize that we were unable to record the session due to
 attendees not having record access.

 Key updates and discussion points:

 *Decisions:*

- Table Format vs. Parquet: There is a general consensus that
column update support should reside in the table format. Consequently, 
 we
have discarded the Parquet-native approach.
- Metadata Representation: To maintain clean metadata and avoid
complex resolution logic for readers, the goal is to keep only one 
 metadata
file per column. However, achieving this is challenging if we support
partial updates, as multiple column files may exist for the same column
(See open questions).
- Data Representation: Sparse column files are preferred for
compact representation and are better suited for partial column 
 updates. We
can optimize sparse representation for vectorized reads by filling in 
 null
or default values at read time for missing positions from the base file,
which avoids joins during reads.


 *Open Questions: *

- We are still determining what restrictions are necessary when
supporting partial updates. For instance, we need to decide whether to 
 add
a new column and s

Re: [Discuss] Efficient column updates in Iceberg

2026-02-13 Thread Anurag Mantripragada
Hi Micah,

This seems like a classic MoR vs CoW trade-off.  But it seems like maybe
> both sparse and full should be available (I understand this adds
> complexity). For adding a new column or completely updating a new column,
> the performance would be better to prefill the data


Our internal use cases are very similar to what you describe. We primarily
deal with full column updates. However, the feedback on the proposal from
the wider community indicated that partial updates (e.g., bug-fixing a
subset of rows, updating features for active users) are also a very common
and critical use case.

Is there evidence to say that partial column updates are more common in
> practice then full rewrites?


Personally, I don't have hard data on which use case is more common in the
wild, only that both appear to be important. I also agree that a good long
term solution should support both strategies. Given that, the sparse
representation with alignment at read time (using dummy/null values) seems
to provide the benefits of both efficient vectorized reads and stitching as
well as support for partial column updates. Would you agree?

~ Anurag

On Fri, Feb 13, 2026 at 9:33 AM Micah Kornfield 
wrote:

> Hi Anurag,
>
>> Data Representation: Sparse column files are preferred for compact
>> representation and are better suited for partial column updates. We can
>> optimize sparse representation for vectorized reads by filling in null
>> or default values at read time for missing positions from the base file,
>> which avoids joins during reads.
>
>
> This seems like a classic MoR vs CoW trade-off.  But it seems like maybe
> both sparse and full should be available (I understand this adds
> complexity).  For adding a new column or completely updating a new column,
> the performance would be better to prefill the data (otherwise one ends up
> duplicating the work that is already happening under the hood in parquet).
>
> Is there evidence to say that partial column updates are more common in
> practice then full rewrites?
>
> Thanks,
> Micah
>
>
> On Thu, Feb 12, 2026 at 3:32 AM Eduard Tudenhöfner <
> [email protected]> wrote:
>
>> Hey Anurag,
>>
>> I wasn't able to make it to the sync but was hoping to watch the
>> recording afterwards.
>> I'm curious what the reasons were for discarding the Parquet-native
>> approach. Could you share a summary from what was discussed in the sync
>> please on that topic?
>>
>> On Tue, Feb 10, 2026 at 8:20 PM Anurag Mantripragada <
>> [email protected]> wrote:
>>
>>> Hi all,
>>>
>>> Thank you for attending today's sync. Please find the meeting notes
>>> below. I apologize that we were unable to record the session due to
>>> attendees not having record access.
>>>
>>> Key updates and discussion points:
>>>
>>> *Decisions:*
>>>
>>>- Table Format vs. Parquet: There is a general consensus that column
>>>update support should reside in the table format. Consequently, we have
>>>discarded the Parquet-native approach.
>>>- Metadata Representation: To maintain clean metadata and avoid
>>>complex resolution logic for readers, the goal is to keep only one 
>>> metadata
>>>file per column. However, achieving this is challenging if we support
>>>partial updates, as multiple column files may exist for the same column
>>>(See open questions).
>>>- Data Representation: Sparse column files are preferred for compact
>>>representation and are better suited for partial column updates. We can
>>>optimize sparse representation for vectorized reads by filling in null or
>>>default values at read time for missing positions from the base file, 
>>> which
>>>avoids joins during reads.
>>>
>>>
>>> *Open Questions: *
>>>
>>>- We are still determining what restrictions are necessary when
>>>supporting partial updates. For instance, we need to decide whether to 
>>> add
>>>a new column and subsequently allow partial updates on it. This would
>>>involve managing both a base column file and subsequent update files.
>>>- We need a better understanding of the use cases for partial
>>>updates.
>>>- We need to further discuss the handling of equality deletes.
>>>
>>> If I missed anything, or if others took notes, please share them here.
>>> Thanks!
>>>
>>> I will go ahead and update the doc with what we have discussed so we can
>>> continue next time from where we left off.
>>>
>>> ~ Anurag
>>>
>>> On Mon, Feb 9, 2026 at 11:55 AM Anurag Mantripragada <
>>> [email protected]> wrote:
>>>
 Hi all,

 This design
 
 will be discussed tomorrow in a dedicated sync.

 Efficient column updates sync
 Tuesday, February 10 · 9:00 – 10:00am
 Time zone: America/Los_Angeles
 Google Meet joining info
 Video call link: https://meet.google.com/xsd-exug-tcd

 ~ Anurag

 On Fri, Feb 6, 2026 a

Re: [Discuss] Efficient column updates in Iceberg

2026-02-13 Thread Anurag Mantripragada
Hi Eduard.

I'm curious what the reasons were for discarding the Parquet-native
> approach. Could you share a summary from what was discussed in the sync
> please on that topic?


My apologies, I should have been more elaborate in the meeting notes. While
we didn't discuss the Parquet approach extensively during the sync, the
consensus to focus on the table format approach formed from feedback on the
doc and agreed on in the sync. The decision to discard the Parquet-native
approach came down to these main arguments:

   - The major advantage of handling column updates in the Iceberg layer is
   that our manifests will always have a complete, self-contained view of all
   files, including the merged column stats from both base and update files.
   This is critical for efficient file pruning. In the Parquet approach,
   Iceberg would only store a reference to the latest logical file, requiring
   a more complex planning phase to discover the full set of stats needed for
   pruning.
   - The Parquet approach makes it difficult to track file lineage. Because
   the relationship between a base file and its update file is hidden inside a
   Parquet footer, it becomes very tricky to determine which physical files
   belong to the table. This complicates operations like removing orphan
   files, especially with stacked updates to the same column, and would likely
   require complex naming conventions to manage.
   - The Parquet-native approach would require changes to the Parquet
   format and its readers. This would involve collaborating with the Parquet
   community to align on a common goal and tying this feature to their release
   cycle.
   - It would be unfortunate to have such a useful feature tied only to
   Parquet. By building the column update logic into the table format itself,
   we create a design that can extend to other formats like ORC in the future.

Hope this helps.

~ Anurag

On Thu, Feb 12, 2026 at 3:32 AM Eduard Tudenhöfner 
wrote:

> Hey Anurag,
>
> I wasn't able to make it to the sync but was hoping to watch the recording
> afterwards.
> I'm curious what the reasons were for discarding the Parquet-native
> approach. Could you share a summary from what was discussed in the sync
> please on that topic?
>
> On Tue, Feb 10, 2026 at 8:20 PM Anurag Mantripragada <
> [email protected]> wrote:
>
>> Hi all,
>>
>> Thank you for attending today's sync. Please find the meeting notes
>> below. I apologize that we were unable to record the session due to
>> attendees not having record access.
>>
>> Key updates and discussion points:
>>
>> *Decisions:*
>>
>>- Table Format vs. Parquet: There is a general consensus that column
>>update support should reside in the table format. Consequently, we have
>>discarded the Parquet-native approach.
>>- Metadata Representation: To maintain clean metadata and avoid
>>complex resolution logic for readers, the goal is to keep only one 
>> metadata
>>file per column. However, achieving this is challenging if we support
>>partial updates, as multiple column files may exist for the same column
>>(See open questions).
>>- Data Representation: Sparse column files are preferred for compact
>>representation and are better suited for partial column updates. We can
>>optimize sparse representation for vectorized reads by filling in null or
>>default values at read time for missing positions from the base file, 
>> which
>>avoids joins during reads.
>>
>>
>> *Open Questions: *
>>
>>- We are still determining what restrictions are necessary when
>>supporting partial updates. For instance, we need to decide whether to add
>>a new column and subsequently allow partial updates on it. This would
>>involve managing both a base column file and subsequent update files.
>>- We need a better understanding of the use cases for partial updates.
>>- We need to further discuss the handling of equality deletes.
>>
>> If I missed anything, or if others took notes, please share them here.
>> Thanks!
>>
>> I will go ahead and update the doc with what we have discussed so we can
>> continue next time from where we left off.
>>
>> ~ Anurag
>>
>> On Mon, Feb 9, 2026 at 11:55 AM Anurag Mantripragada <
>> [email protected]> wrote:
>>
>>> Hi all,
>>>
>>> This design
>>> 
>>> will be discussed tomorrow in a dedicated sync.
>>>
>>> Efficient column updates sync
>>> Tuesday, February 10 · 9:00 – 10:00am
>>> Time zone: America/Los_Angeles
>>> Google Meet joining info
>>> Video call link: https://meet.google.com/xsd-exug-tcd
>>>
>>> ~ Anurag
>>>
>>> On Fri, Feb 6, 2026 at 8:30 AM Anurag Mantripragada <
>>> [email protected]> wrote:
>>>
 Hi Gabor,

 Thanks for the detailed example.

 I agree with Steven that Option 2 seems reasonable. I will add a
 section to the design doc regarding equality

Re: [Discuss] Efficient column updates in Iceberg

2026-02-13 Thread Micah Kornfield
Hi Anurag,

> Data Representation: Sparse column files are preferred for compact
> representation and are better suited for partial column updates. We can
> optimize sparse representation for vectorized reads by filling in null or
> default values at read time for missing positions from the base file, which
> avoids joins during reads.


This seems like a classic MoR vs CoW trade-off.  But it seems like maybe
both sparse and full should be available (I understand this adds
complexity).  For adding a new column or completely updating a new column,
the performance would be better to prefill the data (otherwise one ends up
duplicating the work that is already happening under the hood in parquet).

Is there evidence to say that partial column updates are more common in
practice then full rewrites?

Thanks,
Micah


On Thu, Feb 12, 2026 at 3:32 AM Eduard Tudenhöfner 
wrote:

> Hey Anurag,
>
> I wasn't able to make it to the sync but was hoping to watch the recording
> afterwards.
> I'm curious what the reasons were for discarding the Parquet-native
> approach. Could you share a summary from what was discussed in the sync
> please on that topic?
>
> On Tue, Feb 10, 2026 at 8:20 PM Anurag Mantripragada <
> [email protected]> wrote:
>
>> Hi all,
>>
>> Thank you for attending today's sync. Please find the meeting notes
>> below. I apologize that we were unable to record the session due to
>> attendees not having record access.
>>
>> Key updates and discussion points:
>>
>> *Decisions:*
>>
>>- Table Format vs. Parquet: There is a general consensus that column
>>update support should reside in the table format. Consequently, we have
>>discarded the Parquet-native approach.
>>- Metadata Representation: To maintain clean metadata and avoid
>>complex resolution logic for readers, the goal is to keep only one 
>> metadata
>>file per column. However, achieving this is challenging if we support
>>partial updates, as multiple column files may exist for the same column
>>(See open questions).
>>- Data Representation: Sparse column files are preferred for compact
>>representation and are better suited for partial column updates. We can
>>optimize sparse representation for vectorized reads by filling in null or
>>default values at read time for missing positions from the base file, 
>> which
>>avoids joins during reads.
>>
>>
>> *Open Questions: *
>>
>>- We are still determining what restrictions are necessary when
>>supporting partial updates. For instance, we need to decide whether to add
>>a new column and subsequently allow partial updates on it. This would
>>involve managing both a base column file and subsequent update files.
>>- We need a better understanding of the use cases for partial updates.
>>- We need to further discuss the handling of equality deletes.
>>
>> If I missed anything, or if others took notes, please share them here.
>> Thanks!
>>
>> I will go ahead and update the doc with what we have discussed so we can
>> continue next time from where we left off.
>>
>> ~ Anurag
>>
>> On Mon, Feb 9, 2026 at 11:55 AM Anurag Mantripragada <
>> [email protected]> wrote:
>>
>>> Hi all,
>>>
>>> This design
>>> 
>>> will be discussed tomorrow in a dedicated sync.
>>>
>>> Efficient column updates sync
>>> Tuesday, February 10 · 9:00 – 10:00am
>>> Time zone: America/Los_Angeles
>>> Google Meet joining info
>>> Video call link: https://meet.google.com/xsd-exug-tcd
>>>
>>> ~ Anurag
>>>
>>> On Fri, Feb 6, 2026 at 8:30 AM Anurag Mantripragada <
>>> [email protected]> wrote:
>>>
 Hi Gabor,

 Thanks for the detailed example.

 I agree with Steven that Option 2 seems reasonable. I will add a
 section to the design doc regarding equality delete handling, and we can
 discuss this further during our meeting on Tuesday.

 ~Anurag

 On Fri, Feb 6, 2026 at 7:08 AM Steven Wu  wrote:

> > 1) When deleting with eq-deletes: If there is a column update on
> the equality-filed ID we use for the delete, reject deletion
> > 2) When adding a column update on a column that is part of the
> equality field IDs in some delete, we reject the column update
>
> Gabor, this is a good scenario. The 2nd option makes sense to me,
> since equality ids are like primary key fields. If we have the 2nd rule
> enforced, the first option is not applicable anymore.
>
> On Fri, Feb 6, 2026 at 3:13 AM Gábor Kaszab 
> wrote:
>
>> Hey,
>>
>> Thank you for the proposal, Anurag! I made a pass recently and I
>> think there is some interference between column updates and equality
>> deletes. Let me describe below:
>>
>> Steps:
>>
>> CREATE TABLE tbl (int a, int b);
>>
>> INSERT INTO tbl VALUES (1, 11), (2, 22);  -- creates the base data

Re: [Discuss] Efficient column updates in Iceberg

2026-02-12 Thread Eduard Tudenhöfner
Hey Anurag,

I wasn't able to make it to the sync but was hoping to watch the recording
afterwards.
I'm curious what the reasons were for discarding the Parquet-native
approach. Could you share a summary from what was discussed in the sync
please on that topic?

On Tue, Feb 10, 2026 at 8:20 PM Anurag Mantripragada <
[email protected]> wrote:

> Hi all,
>
> Thank you for attending today's sync. Please find the meeting notes below.
> I apologize that we were unable to record the session due to attendees not
> having record access.
>
> Key updates and discussion points:
>
> *Decisions:*
>
>- Table Format vs. Parquet: There is a general consensus that column
>update support should reside in the table format. Consequently, we have
>discarded the Parquet-native approach.
>- Metadata Representation: To maintain clean metadata and avoid
>complex resolution logic for readers, the goal is to keep only one metadata
>file per column. However, achieving this is challenging if we support
>partial updates, as multiple column files may exist for the same column
>(See open questions).
>- Data Representation: Sparse column files are preferred for compact
>representation and are better suited for partial column updates. We can
>optimize sparse representation for vectorized reads by filling in null or
>default values at read time for missing positions from the base file, which
>avoids joins during reads.
>
>
> *Open Questions: *
>
>- We are still determining what restrictions are necessary when
>supporting partial updates. For instance, we need to decide whether to add
>a new column and subsequently allow partial updates on it. This would
>involve managing both a base column file and subsequent update files.
>- We need a better understanding of the use cases for partial updates.
>- We need to further discuss the handling of equality deletes.
>
> If I missed anything, or if others took notes, please share them here.
> Thanks!
>
> I will go ahead and update the doc with what we have discussed so we can
> continue next time from where we left off.
>
> ~ Anurag
>
> On Mon, Feb 9, 2026 at 11:55 AM Anurag Mantripragada <
> [email protected]> wrote:
>
>> Hi all,
>>
>> This design
>> 
>> will be discussed tomorrow in a dedicated sync.
>>
>> Efficient column updates sync
>> Tuesday, February 10 · 9:00 – 10:00am
>> Time zone: America/Los_Angeles
>> Google Meet joining info
>> Video call link: https://meet.google.com/xsd-exug-tcd
>>
>> ~ Anurag
>>
>> On Fri, Feb 6, 2026 at 8:30 AM Anurag Mantripragada <
>> [email protected]> wrote:
>>
>>> Hi Gabor,
>>>
>>> Thanks for the detailed example.
>>>
>>> I agree with Steven that Option 2 seems reasonable. I will add a section
>>> to the design doc regarding equality delete handling, and we can discuss
>>> this further during our meeting on Tuesday.
>>>
>>> ~Anurag
>>>
>>> On Fri, Feb 6, 2026 at 7:08 AM Steven Wu  wrote:
>>>
 > 1) When deleting with eq-deletes: If there is a column update on the
 equality-filed ID we use for the delete, reject deletion
 > 2) When adding a column update on a column that is part of the
 equality field IDs in some delete, we reject the column update

 Gabor, this is a good scenario. The 2nd option makes sense to me, since
 equality ids are like primary key fields. If we have the 2nd rule enforced,
 the first option is not applicable anymore.

 On Fri, Feb 6, 2026 at 3:13 AM Gábor Kaszab 
 wrote:

> Hey,
>
> Thank you for the proposal, Anurag! I made a pass recently and I think
> there is some interference between column updates and equality deletes. 
> Let
> me describe below:
>
> Steps:
>
> CREATE TABLE tbl (int a, int b);
>
> INSERT INTO tbl VALUES (1, 11), (2, 22);  -- creates the base data file
>
> DELETE FROM tbl WHERE b=11;   -- creates an equality
> delete file
>
> UPDATE tbl SET b=11;   -- writes
> column update
>
>
>
> SELECT * FROM tbl;
>
> Expected result:
>
> (2, 11)
>
>
>
> Data and metadata created after the above steps:
>
> Base file
>
> (1, 11), (2, 22),
>
> seqnum=1
>
> EQ-delete
>
> b=11
>
> seqnum=2
>
> Column update
>
> Field ids: [field_id_for_col_b]
>
> seqnum=3
>
> Data file content: (dummy_value),(11)
>
>
>
> Read steps:
>
>1. Stitch base file with column updates in reader:
>
> Rows: (1,dummy_value), (2,11) (Note, dummy value can be either null,
> or 11, see the proposal for more details)
>
> Seqnum for base file=1
>
> Seqnum for column update=3
>
>2. Apply eq-delete b=11, seqnum=3 on the stitch

Re: [Discuss] Efficient column updates in Iceberg

2026-02-10 Thread Anurag Mantripragada
Hi all,

Thank you for attending today's sync. Please find the meeting notes below.
I apologize that we were unable to record the session due to attendees not
having record access.

Key updates and discussion points:

*Decisions:*

   - Table Format vs. Parquet: There is a general consensus that column
   update support should reside in the table format. Consequently, we have
   discarded the Parquet-native approach.
   - Metadata Representation: To maintain clean metadata and avoid complex
   resolution logic for readers, the goal is to keep only one metadata file
   per column. However, achieving this is challenging if we support partial
   updates, as multiple column files may exist for the same column (See open
   questions).
   - Data Representation: Sparse column files are preferred for compact
   representation and are better suited for partial column updates. We can
   optimize sparse representation for vectorized reads by filling in null or
   default values at read time for missing positions from the base file, which
   avoids joins during reads.


*Open Questions: *

   - We are still determining what restrictions are necessary when
   supporting partial updates. For instance, we need to decide whether to add
   a new column and subsequently allow partial updates on it. This would
   involve managing both a base column file and subsequent update files.
   - We need a better understanding of the use cases for partial updates.
   - We need to further discuss the handling of equality deletes.

If I missed anything, or if others took notes, please share them here.
Thanks!

I will go ahead and update the doc with what we have discussed so we can
continue next time from where we left off.

~ Anurag

On Mon, Feb 9, 2026 at 11:55 AM Anurag Mantripragada <
[email protected]> wrote:

> Hi all,
>
> This design
> 
> will be discussed tomorrow in a dedicated sync.
>
> Efficient column updates sync
> Tuesday, February 10 · 9:00 – 10:00am
> Time zone: America/Los_Angeles
> Google Meet joining info
> Video call link: https://meet.google.com/xsd-exug-tcd
>
> ~ Anurag
>
> On Fri, Feb 6, 2026 at 8:30 AM Anurag Mantripragada <
> [email protected]> wrote:
>
>> Hi Gabor,
>>
>> Thanks for the detailed example.
>>
>> I agree with Steven that Option 2 seems reasonable. I will add a section
>> to the design doc regarding equality delete handling, and we can discuss
>> this further during our meeting on Tuesday.
>>
>> ~Anurag
>>
>> On Fri, Feb 6, 2026 at 7:08 AM Steven Wu  wrote:
>>
>>> > 1) When deleting with eq-deletes: If there is a column update on the
>>> equality-filed ID we use for the delete, reject deletion
>>> > 2) When adding a column update on a column that is part of the
>>> equality field IDs in some delete, we reject the column update
>>>
>>> Gabor, this is a good scenario. The 2nd option makes sense to me, since
>>> equality ids are like primary key fields. If we have the 2nd rule enforced,
>>> the first option is not applicable anymore.
>>>
>>> On Fri, Feb 6, 2026 at 3:13 AM Gábor Kaszab 
>>> wrote:
>>>
 Hey,

 Thank you for the proposal, Anurag! I made a pass recently and I think
 there is some interference between column updates and equality deletes. Let
 me describe below:

 Steps:

 CREATE TABLE tbl (int a, int b);

 INSERT INTO tbl VALUES (1, 11), (2, 22);  -- creates the base data file

 DELETE FROM tbl WHERE b=11;   -- creates an equality delete
 file

 UPDATE tbl SET b=11;   -- writes
 column update



 SELECT * FROM tbl;

 Expected result:

 (2, 11)



 Data and metadata created after the above steps:

 Base file

 (1, 11), (2, 22),

 seqnum=1

 EQ-delete

 b=11

 seqnum=2

 Column update

 Field ids: [field_id_for_col_b]

 seqnum=3

 Data file content: (dummy_value),(11)



 Read steps:

1. Stitch base file with column updates in reader:

 Rows: (1,dummy_value), (2,11) (Note, dummy value can be either null,
 or 11, see the proposal for more details)

 Seqnum for base file=1

 Seqnum for column update=3

2. Apply eq-delete b=11, seqnum=3 on the stitched result
3. Query result depends on which seqnum we carry forward to compare
with the eq-delete's seqnum, but it's not correct in any of the cases
   1. Use seqnum from base file: we get either an empty result if
   'dummy_value' is 11 or we get (1, null) otherwise
   2. Use seqnum from last update file: don't delete any rows,
   result set is (1, dummy_value),(2,11)



 Problem:

 EQ-delete should be applied midway applying the column updates to the
 base file based

Re: [Discuss] Efficient column updates in Iceberg

2026-02-09 Thread Anurag Mantripragada
Hi all,

This design

will be discussed tomorrow in a dedicated sync.

Efficient column updates sync
Tuesday, February 10 · 9:00 – 10:00am
Time zone: America/Los_Angeles
Google Meet joining info
Video call link: https://meet.google.com/xsd-exug-tcd

~ Anurag

On Fri, Feb 6, 2026 at 8:30 AM Anurag Mantripragada <
[email protected]> wrote:

> Hi Gabor,
>
> Thanks for the detailed example.
>
> I agree with Steven that Option 2 seems reasonable. I will add a section
> to the design doc regarding equality delete handling, and we can discuss
> this further during our meeting on Tuesday.
>
> ~Anurag
>
> On Fri, Feb 6, 2026 at 7:08 AM Steven Wu  wrote:
>
>> > 1) When deleting with eq-deletes: If there is a column update on the
>> equality-filed ID we use for the delete, reject deletion
>> > 2) When adding a column update on a column that is part of the
>> equality field IDs in some delete, we reject the column update
>>
>> Gabor, this is a good scenario. The 2nd option makes sense to me, since
>> equality ids are like primary key fields. If we have the 2nd rule enforced,
>> the first option is not applicable anymore.
>>
>> On Fri, Feb 6, 2026 at 3:13 AM Gábor Kaszab 
>> wrote:
>>
>>> Hey,
>>>
>>> Thank you for the proposal, Anurag! I made a pass recently and I think
>>> there is some interference between column updates and equality deletes. Let
>>> me describe below:
>>>
>>> Steps:
>>>
>>> CREATE TABLE tbl (int a, int b);
>>>
>>> INSERT INTO tbl VALUES (1, 11), (2, 22);  -- creates the base data file
>>>
>>> DELETE FROM tbl WHERE b=11;   -- creates an equality delete
>>> file
>>>
>>> UPDATE tbl SET b=11;   -- writes column
>>> update
>>>
>>>
>>>
>>> SELECT * FROM tbl;
>>>
>>> Expected result:
>>>
>>> (2, 11)
>>>
>>>
>>>
>>> Data and metadata created after the above steps:
>>>
>>> Base file
>>>
>>> (1, 11), (2, 22),
>>>
>>> seqnum=1
>>>
>>> EQ-delete
>>>
>>> b=11
>>>
>>> seqnum=2
>>>
>>> Column update
>>>
>>> Field ids: [field_id_for_col_b]
>>>
>>> seqnum=3
>>>
>>> Data file content: (dummy_value),(11)
>>>
>>>
>>>
>>> Read steps:
>>>
>>>1. Stitch base file with column updates in reader:
>>>
>>> Rows: (1,dummy_value), (2,11) (Note, dummy value can be either null, or
>>> 11, see the proposal for more details)
>>>
>>> Seqnum for base file=1
>>>
>>> Seqnum for column update=3
>>>
>>>2. Apply eq-delete b=11, seqnum=3 on the stitched result
>>>3. Query result depends on which seqnum we carry forward to compare
>>>with the eq-delete's seqnum, but it's not correct in any of the cases
>>>   1. Use seqnum from base file: we get either an empty result if
>>>   'dummy_value' is 11 or we get (1, null) otherwise
>>>   2. Use seqnum from last update file: don't delete any rows,
>>>   result set is (1, dummy_value),(2,11)
>>>
>>>
>>>
>>> Problem:
>>>
>>> EQ-delete should be applied midway applying the column updates to the
>>> base file based on sequence number, during the stitching process. If I'm
>>> not mistaken, this is not feasible with the way readers work.
>>>
>>>
>>> Proposal:
>>>
>>> Don't allow equality deletes together with column updates.
>>>
>>>   1) When deleting with eq-deletes: If there is a column update on the
>>> equality-filed ID we use for the delete, reject deletion
>>>
>>>   2) When adding a column update on a column that is part of the
>>> equality field IDs in some delete, we reject the column update
>>>
>>> Alternatively, column updates could be controlled by a property of the
>>> table (immutable), and reject eq-deletes if the property indicates column
>>> updates are turned on for the table
>>>
>>>
>>> Let me know what you think!
>>>
>>> Best Regards,
>>>
>>> Gabor
>>>
>>> Anurag Mantripragada  ezt írta (időpont:
>>> 2026. jan. 28., Sze, 3:31):
>>>
 Thank you everyone for the initial review comments. It is exciting to
 see so much interest in this proposal.

 I am currently reviewing and responding to each comment. The general
 themes of the feedback so far include:
 - Including partial updates (column updates on a subset of rows in a
 table).
 - Adding details on how SQL engines will write the update files.
 - Adding details on split planning and row alignment for update files.

 I will think through these points and update the design accordingly.

 Best
 Anurag

 On Tue, Jan 27, 2026 at 6:25 PM Anurag Mantripragada <
 [email protected]> wrote:

> Hi Xiangin,
>
> Happy to learn from your experience in supporting backfill use-cases.
> Please feel free to review the proposal and add your comments. I will wait
> for a couple of days more to ensure everyone has a chance to review the
> proposal.
>
> ~ Anurag
>
> On Tue, Jan 27, 2026 at 6:42 AM Xianjin Ye  wrote:
>
>> Hi Anurag and Peter,
>>
>> It’s g

Re: [Discuss] Efficient column updates in Iceberg

2026-02-06 Thread Anurag Mantripragada
Hi Gabor,

Thanks for the detailed example.

I agree with Steven that Option 2 seems reasonable. I will add a section to
the design doc regarding equality delete handling, and we can discuss this
further during our meeting on Tuesday.

~Anurag

On Fri, Feb 6, 2026 at 7:08 AM Steven Wu  wrote:

> > 1) When deleting with eq-deletes: If there is a column update on the
> equality-filed ID we use for the delete, reject deletion
> > 2) When adding a column update on a column that is part of the equality
> field IDs in some delete, we reject the column update
>
> Gabor, this is a good scenario. The 2nd option makes sense to me, since
> equality ids are like primary key fields. If we have the 2nd rule enforced,
> the first option is not applicable anymore.
>
> On Fri, Feb 6, 2026 at 3:13 AM Gábor Kaszab 
> wrote:
>
>> Hey,
>>
>> Thank you for the proposal, Anurag! I made a pass recently and I think
>> there is some interference between column updates and equality deletes. Let
>> me describe below:
>>
>> Steps:
>>
>> CREATE TABLE tbl (int a, int b);
>>
>> INSERT INTO tbl VALUES (1, 11), (2, 22);  -- creates the base data file
>>
>> DELETE FROM tbl WHERE b=11;   -- creates an equality delete
>> file
>>
>> UPDATE tbl SET b=11;   -- writes column
>> update
>>
>>
>>
>> SELECT * FROM tbl;
>>
>> Expected result:
>>
>> (2, 11)
>>
>>
>>
>> Data and metadata created after the above steps:
>>
>> Base file
>>
>> (1, 11), (2, 22),
>>
>> seqnum=1
>>
>> EQ-delete
>>
>> b=11
>>
>> seqnum=2
>>
>> Column update
>>
>> Field ids: [field_id_for_col_b]
>>
>> seqnum=3
>>
>> Data file content: (dummy_value),(11)
>>
>>
>>
>> Read steps:
>>
>>1. Stitch base file with column updates in reader:
>>
>> Rows: (1,dummy_value), (2,11) (Note, dummy value can be either null, or
>> 11, see the proposal for more details)
>>
>> Seqnum for base file=1
>>
>> Seqnum for column update=3
>>
>>2. Apply eq-delete b=11, seqnum=3 on the stitched result
>>3. Query result depends on which seqnum we carry forward to compare
>>with the eq-delete's seqnum, but it's not correct in any of the cases
>>   1. Use seqnum from base file: we get either an empty result if
>>   'dummy_value' is 11 or we get (1, null) otherwise
>>   2. Use seqnum from last update file: don't delete any rows, result
>>   set is (1, dummy_value),(2,11)
>>
>>
>>
>> Problem:
>>
>> EQ-delete should be applied midway applying the column updates to the
>> base file based on sequence number, during the stitching process. If I'm
>> not mistaken, this is not feasible with the way readers work.
>>
>>
>> Proposal:
>>
>> Don't allow equality deletes together with column updates.
>>
>>   1) When deleting with eq-deletes: If there is a column update on the
>> equality-filed ID we use for the delete, reject deletion
>>
>>   2) When adding a column update on a column that is part of the equality
>> field IDs in some delete, we reject the column update
>>
>> Alternatively, column updates could be controlled by a property of the
>> table (immutable), and reject eq-deletes if the property indicates column
>> updates are turned on for the table
>>
>>
>> Let me know what you think!
>>
>> Best Regards,
>>
>> Gabor
>>
>> Anurag Mantripragada  ezt írta (időpont: 2026.
>> jan. 28., Sze, 3:31):
>>
>>> Thank you everyone for the initial review comments. It is exciting to
>>> see so much interest in this proposal.
>>>
>>> I am currently reviewing and responding to each comment. The general
>>> themes of the feedback so far include:
>>> - Including partial updates (column updates on a subset of rows in a
>>> table).
>>> - Adding details on how SQL engines will write the update files.
>>> - Adding details on split planning and row alignment for update files.
>>>
>>> I will think through these points and update the design accordingly.
>>>
>>> Best
>>> Anurag
>>>
>>> On Tue, Jan 27, 2026 at 6:25 PM Anurag Mantripragada <
>>> [email protected]> wrote:
>>>
 Hi Xiangin,

 Happy to learn from your experience in supporting backfill use-cases.
 Please feel free to review the proposal and add your comments. I will wait
 for a couple of days more to ensure everyone has a chance to review the
 proposal.

 ~ Anurag

 On Tue, Jan 27, 2026 at 6:42 AM Xianjin Ye  wrote:

> Hi Anurag and Peter,
>
> It’s great to see the partial column update has gained great interest
> in the community. I internally built a BackfillColumns action to
> efficiently backfill columns(by writing the partial columns only and 
> copies
> the binary data of other columns into a new DataFile). The speedup could 
> be
> 10x for wide tables but the write amplification is still there. I would be
> happy to collaborate on the work and eliminate the write amplification.
>
> On 2026/01/27 10:12:54 Péter Váry wrote:
> > Hi Anurag,
> >
> > It’s great to see how much interest there is in t

Re: [Discuss] Efficient column updates in Iceberg

2026-02-06 Thread Steven Wu
> 1) When deleting with eq-deletes: If there is a column update on the
equality-filed ID we use for the delete, reject deletion
> 2) When adding a column update on a column that is part of the equality
field IDs in some delete, we reject the column update

Gabor, this is a good scenario. The 2nd option makes sense to me, since
equality ids are like primary key fields. If we have the 2nd rule enforced,
the first option is not applicable anymore.

On Fri, Feb 6, 2026 at 3:13 AM Gábor Kaszab  wrote:

> Hey,
>
> Thank you for the proposal, Anurag! I made a pass recently and I think
> there is some interference between column updates and equality deletes. Let
> me describe below:
>
> Steps:
>
> CREATE TABLE tbl (int a, int b);
>
> INSERT INTO tbl VALUES (1, 11), (2, 22);  -- creates the base data file
>
> DELETE FROM tbl WHERE b=11;   -- creates an equality delete
> file
>
> UPDATE tbl SET b=11;   -- writes column
> update
>
>
>
> SELECT * FROM tbl;
>
> Expected result:
>
> (2, 11)
>
>
>
> Data and metadata created after the above steps:
>
> Base file
>
> (1, 11), (2, 22),
>
> seqnum=1
>
> EQ-delete
>
> b=11
>
> seqnum=2
>
> Column update
>
> Field ids: [field_id_for_col_b]
>
> seqnum=3
>
> Data file content: (dummy_value),(11)
>
>
>
> Read steps:
>
>1. Stitch base file with column updates in reader:
>
> Rows: (1,dummy_value), (2,11) (Note, dummy value can be either null, or
> 11, see the proposal for more details)
>
> Seqnum for base file=1
>
> Seqnum for column update=3
>
>2. Apply eq-delete b=11, seqnum=3 on the stitched result
>3. Query result depends on which seqnum we carry forward to compare
>with the eq-delete's seqnum, but it's not correct in any of the cases
>   1. Use seqnum from base file: we get either an empty result if
>   'dummy_value' is 11 or we get (1, null) otherwise
>   2. Use seqnum from last update file: don't delete any rows, result
>   set is (1, dummy_value),(2,11)
>
>
>
> Problem:
>
> EQ-delete should be applied midway applying the column updates to the base
> file based on sequence number, during the stitching process. If I'm not
> mistaken, this is not feasible with the way readers work.
>
>
> Proposal:
>
> Don't allow equality deletes together with column updates.
>
>   1) When deleting with eq-deletes: If there is a column update on the
> equality-filed ID we use for the delete, reject deletion
>
>   2) When adding a column update on a column that is part of the equality
> field IDs in some delete, we reject the column update
>
> Alternatively, column updates could be controlled by a property of the
> table (immutable), and reject eq-deletes if the property indicates column
> updates are turned on for the table
>
>
> Let me know what you think!
>
> Best Regards,
>
> Gabor
>
> Anurag Mantripragada  ezt írta (időpont: 2026.
> jan. 28., Sze, 3:31):
>
>> Thank you everyone for the initial review comments. It is exciting to see
>> so much interest in this proposal.
>>
>> I am currently reviewing and responding to each comment. The general
>> themes of the feedback so far include:
>> - Including partial updates (column updates on a subset of rows in a
>> table).
>> - Adding details on how SQL engines will write the update files.
>> - Adding details on split planning and row alignment for update files.
>>
>> I will think through these points and update the design accordingly.
>>
>> Best
>> Anurag
>>
>> On Tue, Jan 27, 2026 at 6:25 PM Anurag Mantripragada <
>> [email protected]> wrote:
>>
>>> Hi Xiangin,
>>>
>>> Happy to learn from your experience in supporting backfill use-cases.
>>> Please feel free to review the proposal and add your comments. I will wait
>>> for a couple of days more to ensure everyone has a chance to review the
>>> proposal.
>>>
>>> ~ Anurag
>>>
>>> On Tue, Jan 27, 2026 at 6:42 AM Xianjin Ye  wrote:
>>>
 Hi Anurag and Peter,

 It’s great to see the partial column update has gained great interest
 in the community. I internally built a BackfillColumns action to
 efficiently backfill columns(by writing the partial columns only and copies
 the binary data of other columns into a new DataFile). The speedup could be
 10x for wide tables but the write amplification is still there. I would be
 happy to collaborate on the work and eliminate the write amplification.

 On 2026/01/27 10:12:54 Péter Váry wrote:
 > Hi Anurag,
 >
 > It’s great to see how much interest there is in the community around
 this
 > potential new feature. Gábor and I have actually submitted an Iceberg
 > Summit talk proposal on this topic, and we would be very happy to
 > collaborate on the work. I was mainly waiting for the File Format API
 to be
 > finalized, as I believe this feature should build on top of it.
 >
 > For reference, our related work includes:
 >
 >- *Dev list thread:*
 >https://lists.apache.org/thread/h

Re: [Discuss] Efficient column updates in Iceberg

2026-02-06 Thread Gábor Kaszab
Hey,

Thank you for the proposal, Anurag! I made a pass recently and I think
there is some interference between column updates and equality deletes. Let
me describe below:

Steps:

CREATE TABLE tbl (int a, int b);

INSERT INTO tbl VALUES (1, 11), (2, 22);  -- creates the base data file

DELETE FROM tbl WHERE b=11;   -- creates an equality delete file

UPDATE tbl SET b=11;   -- writes column
update



SELECT * FROM tbl;

Expected result:

(2, 11)



Data and metadata created after the above steps:

Base file

(1, 11), (2, 22),

seqnum=1

EQ-delete

b=11

seqnum=2

Column update

Field ids: [field_id_for_col_b]

seqnum=3

Data file content: (dummy_value),(11)



Read steps:

   1. Stitch base file with column updates in reader:

Rows: (1,dummy_value), (2,11) (Note, dummy value can be either null, or 11,
see the proposal for more details)

Seqnum for base file=1

Seqnum for column update=3

   2. Apply eq-delete b=11, seqnum=3 on the stitched result
   3. Query result depends on which seqnum we carry forward to compare with
   the eq-delete's seqnum, but it's not correct in any of the cases
  1. Use seqnum from base file: we get either an empty result if
  'dummy_value' is 11 or we get (1, null) otherwise
  2. Use seqnum from last update file: don't delete any rows, result
  set is (1, dummy_value),(2,11)



Problem:

EQ-delete should be applied midway applying the column updates to the base
file based on sequence number, during the stitching process. If I'm not
mistaken, this is not feasible with the way readers work.


Proposal:

Don't allow equality deletes together with column updates.

  1) When deleting with eq-deletes: If there is a column update on the
equality-filed ID we use for the delete, reject deletion

  2) When adding a column update on a column that is part of the equality
field IDs in some delete, we reject the column update

Alternatively, column updates could be controlled by a property of the
table (immutable), and reject eq-deletes if the property indicates column
updates are turned on for the table


Let me know what you think!

Best Regards,

Gabor

Anurag Mantripragada  ezt írta (időpont: 2026.
jan. 28., Sze, 3:31):

> Thank you everyone for the initial review comments. It is exciting to see
> so much interest in this proposal.
>
> I am currently reviewing and responding to each comment. The general
> themes of the feedback so far include:
> - Including partial updates (column updates on a subset of rows in a
> table).
> - Adding details on how SQL engines will write the update files.
> - Adding details on split planning and row alignment for update files.
>
> I will think through these points and update the design accordingly.
>
> Best
> Anurag
>
> On Tue, Jan 27, 2026 at 6:25 PM Anurag Mantripragada <
> [email protected]> wrote:
>
>> Hi Xiangin,
>>
>> Happy to learn from your experience in supporting backfill use-cases.
>> Please feel free to review the proposal and add your comments. I will wait
>> for a couple of days more to ensure everyone has a chance to review the
>> proposal.
>>
>> ~ Anurag
>>
>> On Tue, Jan 27, 2026 at 6:42 AM Xianjin Ye  wrote:
>>
>>> Hi Anurag and Peter,
>>>
>>> It’s great to see the partial column update has gained great interest in
>>> the community. I internally built a BackfillColumns action to efficiently
>>> backfill columns(by writing the partial columns only and copies the binary
>>> data of other columns into a new DataFile). The speedup could be 10x for
>>> wide tables but the write amplification is still there. I would be happy to
>>> collaborate on the work and eliminate the write amplification.
>>>
>>> On 2026/01/27 10:12:54 Péter Váry wrote:
>>> > Hi Anurag,
>>> >
>>> > It’s great to see how much interest there is in the community around
>>> this
>>> > potential new feature. Gábor and I have actually submitted an Iceberg
>>> > Summit talk proposal on this topic, and we would be very happy to
>>> > collaborate on the work. I was mainly waiting for the File Format API
>>> to be
>>> > finalized, as I believe this feature should build on top of it.
>>> >
>>> > For reference, our related work includes:
>>> >
>>> >- *Dev list thread:*
>>> >https://lists.apache.org/thread/h0941sdq9jwrb6sj0pjfjjxov8tx7ov9
>>> >- *Proposal document:*
>>> >
>>> https://docs.google.com/document/d/1OHuZ6RyzZvCOQ6UQoV84GzwVp3UPiu_cfXClsOi03ww
>>> >(not shared widely yet)
>>> >- *Performance testing PR for readers and writers:*
>>> >https://github.com/apache/iceberg/pull/13306
>>> >
>>> > During earlier discussions about possible metadata changes, another
>>> option
>>> > came up that hasn’t been documented yet: separating planner metadata
>>> from
>>> > reader metadata. Since the planner does not need to know about the
>>> actual
>>> > files, we could store the file composition in a separate file
>>> (potentially
>>> > a Puffin file). This file could hold the column_files metadata, 

Re: [Discuss] Efficient column updates in Iceberg

2026-01-27 Thread Anurag Mantripragada
Thank you everyone for the initial review comments. It is exciting to see
so much interest in this proposal.

I am currently reviewing and responding to each comment. The general themes
of the feedback so far include:
- Including partial updates (column updates on a subset of rows in a table).
- Adding details on how SQL engines will write the update files.
- Adding details on split planning and row alignment for update files.

I will think through these points and update the design accordingly.

Best
Anurag

On Tue, Jan 27, 2026 at 6:25 PM Anurag Mantripragada <
[email protected]> wrote:

> Hi Xiangin,
>
> Happy to learn from your experience in supporting backfill use-cases.
> Please feel free to review the proposal and add your comments. I will wait
> for a couple of days more to ensure everyone has a chance to review the
> proposal.
>
> ~ Anurag
>
> On Tue, Jan 27, 2026 at 6:42 AM Xianjin Ye  wrote:
>
>> Hi Anurag and Peter,
>>
>> It’s great to see the partial column update has gained great interest in
>> the community. I internally built a BackfillColumns action to efficiently
>> backfill columns(by writing the partial columns only and copies the binary
>> data of other columns into a new DataFile). The speedup could be 10x for
>> wide tables but the write amplification is still there. I would be happy to
>> collaborate on the work and eliminate the write amplification.
>>
>> On 2026/01/27 10:12:54 Péter Váry wrote:
>> > Hi Anurag,
>> >
>> > It’s great to see how much interest there is in the community around
>> this
>> > potential new feature. Gábor and I have actually submitted an Iceberg
>> > Summit talk proposal on this topic, and we would be very happy to
>> > collaborate on the work. I was mainly waiting for the File Format API
>> to be
>> > finalized, as I believe this feature should build on top of it.
>> >
>> > For reference, our related work includes:
>> >
>> >- *Dev list thread:*
>> >https://lists.apache.org/thread/h0941sdq9jwrb6sj0pjfjjxov8tx7ov9
>> >- *Proposal document:*
>> >
>> https://docs.google.com/document/d/1OHuZ6RyzZvCOQ6UQoV84GzwVp3UPiu_cfXClsOi03ww
>> >(not shared widely yet)
>> >- *Performance testing PR for readers and writers:*
>> >https://github.com/apache/iceberg/pull/13306
>> >
>> > During earlier discussions about possible metadata changes, another
>> option
>> > came up that hasn’t been documented yet: separating planner metadata
>> from
>> > reader metadata. Since the planner does not need to know about the
>> actual
>> > files, we could store the file composition in a separate file
>> (potentially
>> > a Puffin file). This file could hold the column_files metadata, while
>> the
>> > manifest would reference the Puffin file and blob position instead of
>> the
>> > data filename.
>> > This approach has the advantage of keeping the existing metadata largely
>> > intact, and it could also give us a natural place later to add
>> file-level
>> > indexes or Bloom filters for use during reads or secondary filtering.
>> The
>> > downsides are the additional files and the increased complexity of
>> > identifying files that are no longer referenced by the table, so this
>> may
>> > not be an ideal solution.
>> >
>> > I do have some concerns about the MoR metadata proposal described in the
>> > document. At first glance, it seems to complicate distributed planning,
>> as
>> > all entries for a given file would need to be collected and merged to
>> > provide the information required by both the planner and the reader.
>> > Additionally, when a new column is added or updated, we would still
>> need to
>> > add a new metadata entry for every existing data file. If we immediately
>> > write out the merged metadata, the total number of entries remains the
>> > same. The main benefit is avoiding rewriting statistics, which can be
>> > significant, but this comes at the cost of increased planning
>> complexity.
>> > If we choose to store the merged statistics in the column_families
>> entry, I
>> > don’t see much benefit in excluding the rest of the metadata, especially
>> > since including it would simplify the planning process.
>> >
>> > As Anton already pointed out, we should also discuss how this change
>> would
>> > affect split handling, particularly how to avoid double reads when row
>> > groups are not aligned between the original data files and the new
>> column
>> > files.
>> >
>> > Finally, I’d like to see some discussion around the Java API
>> implications.
>> > In particular, what API changes are required, and how SQL engines would
>> > perform updates. Since the new column files must have the same number of
>> > rows as the original data files, with a strict one-to-one relationship,
>> SQL
>> > engines would need access to the source filename, position, and deletion
>> > status in the DataFrame in order to generate the new files. This is more
>> > involved than a simple update and deserves some explicit consideration.
>> >
>> > Looking forward 

Re: [Discuss] Efficient column updates in Iceberg

2026-01-27 Thread Anurag Mantripragada
Hi Xiangin,

Happy to learn from your experience in supporting backfill use-cases.
Please feel free to review the proposal and add your comments. I will wait
for a couple of days more to ensure everyone has a chance to review the
proposal.

~ Anurag

On Tue, Jan 27, 2026 at 6:42 AM Xianjin Ye  wrote:

> Hi Anurag and Peter,
>
> It’s great to see the partial column update has gained great interest in
> the community. I internally built a BackfillColumns action to efficiently
> backfill columns(by writing the partial columns only and copies the binary
> data of other columns into a new DataFile). The speedup could be 10x for
> wide tables but the write amplification is still there. I would be happy to
> collaborate on the work and eliminate the write amplification.
>
> On 2026/01/27 10:12:54 Péter Váry wrote:
> > Hi Anurag,
> >
> > It’s great to see how much interest there is in the community around this
> > potential new feature. Gábor and I have actually submitted an Iceberg
> > Summit talk proposal on this topic, and we would be very happy to
> > collaborate on the work. I was mainly waiting for the File Format API to
> be
> > finalized, as I believe this feature should build on top of it.
> >
> > For reference, our related work includes:
> >
> >- *Dev list thread:*
> >https://lists.apache.org/thread/h0941sdq9jwrb6sj0pjfjjxov8tx7ov9
> >- *Proposal document:*
> >
> https://docs.google.com/document/d/1OHuZ6RyzZvCOQ6UQoV84GzwVp3UPiu_cfXClsOi03ww
> >(not shared widely yet)
> >- *Performance testing PR for readers and writers:*
> >https://github.com/apache/iceberg/pull/13306
> >
> > During earlier discussions about possible metadata changes, another
> option
> > came up that hasn’t been documented yet: separating planner metadata from
> > reader metadata. Since the planner does not need to know about the actual
> > files, we could store the file composition in a separate file
> (potentially
> > a Puffin file). This file could hold the column_files metadata, while the
> > manifest would reference the Puffin file and blob position instead of the
> > data filename.
> > This approach has the advantage of keeping the existing metadata largely
> > intact, and it could also give us a natural place later to add file-level
> > indexes or Bloom filters for use during reads or secondary filtering. The
> > downsides are the additional files and the increased complexity of
> > identifying files that are no longer referenced by the table, so this may
> > not be an ideal solution.
> >
> > I do have some concerns about the MoR metadata proposal described in the
> > document. At first glance, it seems to complicate distributed planning,
> as
> > all entries for a given file would need to be collected and merged to
> > provide the information required by both the planner and the reader.
> > Additionally, when a new column is added or updated, we would still need
> to
> > add a new metadata entry for every existing data file. If we immediately
> > write out the merged metadata, the total number of entries remains the
> > same. The main benefit is avoiding rewriting statistics, which can be
> > significant, but this comes at the cost of increased planning complexity.
> > If we choose to store the merged statistics in the column_families
> entry, I
> > don’t see much benefit in excluding the rest of the metadata, especially
> > since including it would simplify the planning process.
> >
> > As Anton already pointed out, we should also discuss how this change
> would
> > affect split handling, particularly how to avoid double reads when row
> > groups are not aligned between the original data files and the new column
> > files.
> >
> > Finally, I’d like to see some discussion around the Java API
> implications.
> > In particular, what API changes are required, and how SQL engines would
> > perform updates. Since the new column files must have the same number of
> > rows as the original data files, with a strict one-to-one relationship,
> SQL
> > engines would need access to the source filename, position, and deletion
> > status in the DataFrame in order to generate the new files. This is more
> > involved than a simple update and deserves some explicit consideration.
> >
> > Looking forward to your thoughts.
> > Best regards,
> > Peter
> >
> > On Tue, Jan 27, 2026, 03:58 Anurag Mantripragada <
> [email protected]>
> > wrote:
> >
> > > Thanks Anton and others, for providing some initial feedback. I will
> > > address all your comments soon.
> > >
> > > On Mon, Jan 26, 2026 at 11:10 AM Anton Okolnychyi <
> [email protected]>
> > > wrote:
> > >
> > >> I had a chance to see the proposal before it landed and I think it is
> a
> > >> cool idea and both presented approaches would likely work. I am
> looking
> > >> forward to discussing the tradeoffs and would encourage everyone to
> > >> push/polish each approach to see what issues can be mitigated and
> what are
> > >> fundamental.
> > >>
> > >> [1] Ice

Re: [Discuss] Efficient column updates in Iceberg

2026-01-27 Thread Anurag Mantripragada
Hi Peter,

Thanks for reviewing the proposal.

Regarding your concerns about the MoR metadata proposal, I believe there
may be a misunderstanding of the primary approach. In the document, I
actually discarded the MoR metadata proposal (see Approach 3) due to high
planning costs. My main proposal (Approach 1) utilizes CoW metadata, which
rewrites the entry metadata for existing entries. This aligns closely with
your suggestion.

I will add more detail on the SQL execution and split planning to the doc.

~ Anurag

On Tue, Jan 27, 2026 at 2:13 AM Péter Váry 
wrote:

> Hi Anurag,
>
> It’s great to see how much interest there is in the community around this
> potential new feature. Gábor and I have actually submitted an Iceberg
> Summit talk proposal on this topic, and we would be very happy to
> collaborate on the work. I was mainly waiting for the File Format API to be
> finalized, as I believe this feature should build on top of it.
>
> For reference, our related work includes:
>
>- *Dev list thread:*
>https://lists.apache.org/thread/h0941sdq9jwrb6sj0pjfjjxov8tx7ov9
>- *Proposal document:*
>
> https://docs.google.com/document/d/1OHuZ6RyzZvCOQ6UQoV84GzwVp3UPiu_cfXClsOi03ww
>(not shared widely yet)
>- *Performance testing PR for readers and writers:*
>https://github.com/apache/iceberg/pull/13306
>
> During earlier discussions about possible metadata changes, another option
> came up that hasn’t been documented yet: separating planner metadata from
> reader metadata. Since the planner does not need to know about the actual
> files, we could store the file composition in a separate file (potentially
> a Puffin file). This file could hold the column_files metadata, while the
> manifest would reference the Puffin file and blob position instead of the
> data filename.
> This approach has the advantage of keeping the existing metadata largely
> intact, and it could also give us a natural place later to add file-level
> indexes or Bloom filters for use during reads or secondary filtering. The
> downsides are the additional files and the increased complexity of
> identifying files that are no longer referenced by the table, so this may
> not be an ideal solution.
>
> I do have some concerns about the MoR metadata proposal described in the
> document. At first glance, it seems to complicate distributed planning, as
> all entries for a given file would need to be collected and merged to
> provide the information required by both the planner and the reader.
> Additionally, when a new column is added or updated, we would still need to
> add a new metadata entry for every existing data file. If we immediately
> write out the merged metadata, the total number of entries remains the
> same. The main benefit is avoiding rewriting statistics, which can be
> significant, but this comes at the cost of increased planning complexity.
> If we choose to store the merged statistics in the column_families entry, I
> don’t see much benefit in excluding the rest of the metadata, especially
> since including it would simplify the planning process.
>
> As Anton already pointed out, we should also discuss how this change would
> affect split handling, particularly how to avoid double reads when row
> groups are not aligned between the original data files and the new column
> files.
>
> Finally, I’d like to see some discussion around the Java API implications.
> In particular, what API changes are required, and how SQL engines would
> perform updates. Since the new column files must have the same number of
> rows as the original data files, with a strict one-to-one relationship, SQL
> engines would need access to the source filename, position, and deletion
> status in the DataFrame in order to generate the new files. This is more
> involved than a simple update and deserves some explicit consideration.
>
> Looking forward to your thoughts.
> Best regards,
> Peter
>
> On Tue, Jan 27, 2026, 03:58 Anurag Mantripragada 
> wrote:
>
>> Thanks Anton and others, for providing some initial feedback. I will
>> address all your comments soon.
>>
>> On Mon, Jan 26, 2026 at 11:10 AM Anton Okolnychyi 
>> wrote:
>>
>>> I had a chance to see the proposal before it landed and I think it is a
>>> cool idea and both presented approaches would likely work. I am looking
>>> forward to discussing the tradeoffs and would encourage everyone to
>>> push/polish each approach to see what issues can be mitigated and what are
>>> fundamental.
>>>
>>> [1] Iceberg-native approach: better visibility into column files from
>>> the metadata, potentially better concurrency for non-overlapping column
>>> updates, no dep on Parquet.
>>> [2] Parquet-native approach: almost no changes to the table format
>>> metadata beyond tracking of base files.
>>>
>>> I think [1] sounds a bit better on paper but I am worried about the
>>> complexity in writers and readers (especially around keeping row groups
>>> aligned and split planning). It would be great to c

Re: [Discuss] Efficient column updates in Iceberg

2026-01-27 Thread Xianjin Ye
Hi Anurag and Peter,

It’s great to see the partial column update has gained great interest in the 
community. I internally built a BackfillColumns action to efficiently backfill 
columns(by writing the partial columns only and copies the binary data of other 
columns into a new DataFile). The speedup could be 10x for wide tables but the 
write amplification is still there. I would be happy to collaborate on the work 
and eliminate the write amplification. 

On 2026/01/27 10:12:54 Péter Váry wrote:
> Hi Anurag,
> 
> It’s great to see how much interest there is in the community around this
> potential new feature. Gábor and I have actually submitted an Iceberg
> Summit talk proposal on this topic, and we would be very happy to
> collaborate on the work. I was mainly waiting for the File Format API to be
> finalized, as I believe this feature should build on top of it.
> 
> For reference, our related work includes:
> 
>- *Dev list thread:*
>https://lists.apache.org/thread/h0941sdq9jwrb6sj0pjfjjxov8tx7ov9
>- *Proposal document:*
>
> https://docs.google.com/document/d/1OHuZ6RyzZvCOQ6UQoV84GzwVp3UPiu_cfXClsOi03ww
>(not shared widely yet)
>- *Performance testing PR for readers and writers:*
>https://github.com/apache/iceberg/pull/13306
> 
> During earlier discussions about possible metadata changes, another option
> came up that hasn’t been documented yet: separating planner metadata from
> reader metadata. Since the planner does not need to know about the actual
> files, we could store the file composition in a separate file (potentially
> a Puffin file). This file could hold the column_files metadata, while the
> manifest would reference the Puffin file and blob position instead of the
> data filename.
> This approach has the advantage of keeping the existing metadata largely
> intact, and it could also give us a natural place later to add file-level
> indexes or Bloom filters for use during reads or secondary filtering. The
> downsides are the additional files and the increased complexity of
> identifying files that are no longer referenced by the table, so this may
> not be an ideal solution.
> 
> I do have some concerns about the MoR metadata proposal described in the
> document. At first glance, it seems to complicate distributed planning, as
> all entries for a given file would need to be collected and merged to
> provide the information required by both the planner and the reader.
> Additionally, when a new column is added or updated, we would still need to
> add a new metadata entry for every existing data file. If we immediately
> write out the merged metadata, the total number of entries remains the
> same. The main benefit is avoiding rewriting statistics, which can be
> significant, but this comes at the cost of increased planning complexity.
> If we choose to store the merged statistics in the column_families entry, I
> don’t see much benefit in excluding the rest of the metadata, especially
> since including it would simplify the planning process.
> 
> As Anton already pointed out, we should also discuss how this change would
> affect split handling, particularly how to avoid double reads when row
> groups are not aligned between the original data files and the new column
> files.
> 
> Finally, I’d like to see some discussion around the Java API implications.
> In particular, what API changes are required, and how SQL engines would
> perform updates. Since the new column files must have the same number of
> rows as the original data files, with a strict one-to-one relationship, SQL
> engines would need access to the source filename, position, and deletion
> status in the DataFrame in order to generate the new files. This is more
> involved than a simple update and deserves some explicit consideration.
> 
> Looking forward to your thoughts.
> Best regards,
> Peter
> 
> On Tue, Jan 27, 2026, 03:58 Anurag Mantripragada 
> wrote:
> 
> > Thanks Anton and others, for providing some initial feedback. I will
> > address all your comments soon.
> >
> > On Mon, Jan 26, 2026 at 11:10 AM Anton Okolnychyi 
> > wrote:
> >
> >> I had a chance to see the proposal before it landed and I think it is a
> >> cool idea and both presented approaches would likely work. I am looking
> >> forward to discussing the tradeoffs and would encourage everyone to
> >> push/polish each approach to see what issues can be mitigated and what are
> >> fundamental.
> >>
> >> [1] Iceberg-native approach: better visibility into column files from the
> >> metadata, potentially better concurrency for non-overlapping column
> >> updates, no dep on Parquet.
> >> [2] Parquet-native approach: almost no changes to the table format
> >> metadata beyond tracking of base files.
> >>
> >> I think [1] sounds a bit better on paper but I am worried about the
> >> complexity in writers and readers (especially around keeping row groups
> >> aligned and split planning). It would be great to cover this in detail in
> >> the pro

Re: [Discuss] Efficient column updates in Iceberg

2026-01-27 Thread Péter Váry
Hi Anurag,

It’s great to see how much interest there is in the community around this
potential new feature. Gábor and I have actually submitted an Iceberg
Summit talk proposal on this topic, and we would be very happy to
collaborate on the work. I was mainly waiting for the File Format API to be
finalized, as I believe this feature should build on top of it.

For reference, our related work includes:

   - *Dev list thread:*
   https://lists.apache.org/thread/h0941sdq9jwrb6sj0pjfjjxov8tx7ov9
   - *Proposal document:*
   
https://docs.google.com/document/d/1OHuZ6RyzZvCOQ6UQoV84GzwVp3UPiu_cfXClsOi03ww
   (not shared widely yet)
   - *Performance testing PR for readers and writers:*
   https://github.com/apache/iceberg/pull/13306

During earlier discussions about possible metadata changes, another option
came up that hasn’t been documented yet: separating planner metadata from
reader metadata. Since the planner does not need to know about the actual
files, we could store the file composition in a separate file (potentially
a Puffin file). This file could hold the column_files metadata, while the
manifest would reference the Puffin file and blob position instead of the
data filename.
This approach has the advantage of keeping the existing metadata largely
intact, and it could also give us a natural place later to add file-level
indexes or Bloom filters for use during reads or secondary filtering. The
downsides are the additional files and the increased complexity of
identifying files that are no longer referenced by the table, so this may
not be an ideal solution.

I do have some concerns about the MoR metadata proposal described in the
document. At first glance, it seems to complicate distributed planning, as
all entries for a given file would need to be collected and merged to
provide the information required by both the planner and the reader.
Additionally, when a new column is added or updated, we would still need to
add a new metadata entry for every existing data file. If we immediately
write out the merged metadata, the total number of entries remains the
same. The main benefit is avoiding rewriting statistics, which can be
significant, but this comes at the cost of increased planning complexity.
If we choose to store the merged statistics in the column_families entry, I
don’t see much benefit in excluding the rest of the metadata, especially
since including it would simplify the planning process.

As Anton already pointed out, we should also discuss how this change would
affect split handling, particularly how to avoid double reads when row
groups are not aligned between the original data files and the new column
files.

Finally, I’d like to see some discussion around the Java API implications.
In particular, what API changes are required, and how SQL engines would
perform updates. Since the new column files must have the same number of
rows as the original data files, with a strict one-to-one relationship, SQL
engines would need access to the source filename, position, and deletion
status in the DataFrame in order to generate the new files. This is more
involved than a simple update and deserves some explicit consideration.

Looking forward to your thoughts.
Best regards,
Peter

On Tue, Jan 27, 2026, 03:58 Anurag Mantripragada 
wrote:

> Thanks Anton and others, for providing some initial feedback. I will
> address all your comments soon.
>
> On Mon, Jan 26, 2026 at 11:10 AM Anton Okolnychyi 
> wrote:
>
>> I had a chance to see the proposal before it landed and I think it is a
>> cool idea and both presented approaches would likely work. I am looking
>> forward to discussing the tradeoffs and would encourage everyone to
>> push/polish each approach to see what issues can be mitigated and what are
>> fundamental.
>>
>> [1] Iceberg-native approach: better visibility into column files from the
>> metadata, potentially better concurrency for non-overlapping column
>> updates, no dep on Parquet.
>> [2] Parquet-native approach: almost no changes to the table format
>> metadata beyond tracking of base files.
>>
>> I think [1] sounds a bit better on paper but I am worried about the
>> complexity in writers and readers (especially around keeping row groups
>> aligned and split planning). It would be great to cover this in detail in
>> the proposal.
>>
>> пн, 26 січ. 2026 р. о 09:00 Anurag Mantripragada <
>> [email protected]> пише:
>>
>>> Hi all,
>>>
>>> "Wide tables" with thousands of columns present significant challenges
>>> for AI/ML workloads, particularly when only a subset of columns needs to be
>>> added or updated. Current Copy-on-Write (COW) and Merge-on-Read (MOR)
>>> operations in Iceberg apply at the row level, which leads to substantial
>>> write amplification in scenarios such as:
>>>
>>>- Feature Backfilling & Column Updates: Adding new feature columns
>>>(e.g., model embeddings) to petabyte-scale tables.
>>>- Model Score Updates: Refresh prediction scores after retrai

Re: [Discuss] Efficient column updates in Iceberg

2026-01-26 Thread Anurag Mantripragada
Thanks Anton and others, for providing some initial feedback. I will
address all your comments soon.

On Mon, Jan 26, 2026 at 11:10 AM Anton Okolnychyi 
wrote:

> I had a chance to see the proposal before it landed and I think it is a
> cool idea and both presented approaches would likely work. I am looking
> forward to discussing the tradeoffs and would encourage everyone to
> push/polish each approach to see what issues can be mitigated and what are
> fundamental.
>
> [1] Iceberg-native approach: better visibility into column files from the
> metadata, potentially better concurrency for non-overlapping column
> updates, no dep on Parquet.
> [2] Parquet-native approach: almost no changes to the table format
> metadata beyond tracking of base files.
>
> I think [1] sounds a bit better on paper but I am worried about the
> complexity in writers and readers (especially around keeping row groups
> aligned and split planning). It would be great to cover this in detail in
> the proposal.
>
> пн, 26 січ. 2026 р. о 09:00 Anurag Mantripragada <
> [email protected]> пише:
>
>> Hi all,
>>
>> "Wide tables" with thousands of columns present significant challenges
>> for AI/ML workloads, particularly when only a subset of columns needs to be
>> added or updated. Current Copy-on-Write (COW) and Merge-on-Read (MOR)
>> operations in Iceberg apply at the row level, which leads to substantial
>> write amplification in scenarios such as:
>>
>>- Feature Backfilling & Column Updates: Adding new feature columns
>>(e.g., model embeddings) to petabyte-scale tables.
>>- Model Score Updates: Refresh prediction scores after retraining.
>>- Embedding Refresh: Updating vector embeddings, which currently
>>triggers a rewrite of the entire row.
>>- Incremental Feature Computation: Daily updates to a small fraction
>>of features in wide tables.
>>
>> With the Iceberg V4 proposal introducing single-file commits and column
>> stats improvements, this is an ideal time to address column-level updates
>> to better support these use cases.
>>
>> I have drafted a proposal that explores both table-format enhancements
>> and file-format (Parquet) changes to enable more efficient updates.
>>
>> Proposal Details:
>> - GitHub Issue: #15146 
>> - Design Document: Efficient Column Updates in Iceberg
>> 
>>
>> Next Steps:
>> I plan to create POCs to benchmark the approaches described in the
>> document.
>>
>> Please review the proposal and share your feedback.
>>
>> Thanks,
>> Anurag
>>
>


Re: [Discuss] Efficient column updates in Iceberg

2026-01-26 Thread Anurag Mantripragada
Hi Gang,

Thanks for the pointers. I reviewed Peter's column family design and the
related dev list discussions while researching this proposal.

The current design differs in two ways:

   - It is built on top of V4 metadata structures.
   - It generalizes the column family approach, which requires pre-planning
   how columns are assigned to specific families.

The reader implementations have significant overlap, particularly regarding
row alignment and positional stitching. I have already reached out to Peter
to collaborate on this proposal.

~ Anurag

On Mon, Jan 26, 2026 at 6:08 PM Gang Wu  wrote:

> I remember that Peter has initialized a relevant discussion in [1] and
> spent some time on the design and benchmark of a similar approach
> (introducing column families).
>
> Perhaps there is an opportunity to join the effort?
>
> [1] https://lists.apache.org/thread/h0941sdq9jwrb6sj0pjfjjxov8tx7ov9
>
> On Tue, Jan 27, 2026 at 3:10 AM Anton Okolnychyi 
> wrote:
> >
> > I had a chance to see the proposal before it landed and I think it is a
> cool idea and both presented approaches would likely work. I am looking
> forward to discussing the tradeoffs and would encourage everyone to
> push/polish each approach to see what issues can be mitigated and what are
> fundamental.
> >
> > [1] Iceberg-native approach: better visibility into column files from
> the metadata, potentially better concurrency for non-overlapping column
> updates, no dep on Parquet.
> > [2] Parquet-native approach: almost no changes to the table format
> metadata beyond tracking of base files.
> >
> > I think [1] sounds a bit better on paper but I am worried about the
> complexity in writers and readers (especially around keeping row groups
> aligned and split planning). It would be great to cover this in detail in
> the proposal.
> >
> > пн, 26 січ. 2026 р. о 09:00 Anurag Mantripragada <
> [email protected]> пише:
> >>
> >> Hi all,
> >>
> >> "Wide tables" with thousands of columns present significant challenges
> for AI/ML workloads, particularly when only a subset of columns needs to be
> added or updated. Current Copy-on-Write (COW) and Merge-on-Read (MOR)
> operations in Iceberg apply at the row level, which leads to substantial
> write amplification in scenarios such as:
> >>
> >> Feature Backfilling & Column Updates: Adding new feature columns (e.g.,
> model embeddings) to petabyte-scale tables.
> >> Model Score Updates: Refresh prediction scores after retraining.
> >> Embedding Refresh: Updating vector embeddings, which currently triggers
> a rewrite of the entire row.
> >> Incremental Feature Computation: Daily updates to a small fraction of
> features in wide tables.
> >>
> >> With the Iceberg V4 proposal introducing single-file commits and column
> stats improvements, this is an ideal time to address column-level updates
> to better support these use cases.
> >>
> >> I have drafted a proposal that explores both table-format enhancements
> and file-format (Parquet) changes to enable more efficient updates.
> >>
> >> Proposal Details:
> >> - GitHub Issue: #15146
> >> - Design Document: Efficient Column Updates in Iceberg
> >>
> >> Next Steps:
> >> I plan to create POCs to benchmark the approaches described in the
> document.
> >>
> >> Please review the proposal and share your feedback.
> >>
> >> Thanks,
> >> Anurag
>


Re: [Discuss] Efficient column updates in Iceberg

2026-01-26 Thread Gang Wu
I remember that Peter has initialized a relevant discussion in [1] and
spent some time on the design and benchmark of a similar approach
(introducing column families).

Perhaps there is an opportunity to join the effort?

[1] https://lists.apache.org/thread/h0941sdq9jwrb6sj0pjfjjxov8tx7ov9

On Tue, Jan 27, 2026 at 3:10 AM Anton Okolnychyi  wrote:
>
> I had a chance to see the proposal before it landed and I think it is a cool 
> idea and both presented approaches would likely work. I am looking forward to 
> discussing the tradeoffs and would encourage everyone to push/polish each 
> approach to see what issues can be mitigated and what are fundamental.
>
> [1] Iceberg-native approach: better visibility into column files from the 
> metadata, potentially better concurrency for non-overlapping column updates, 
> no dep on Parquet.
> [2] Parquet-native approach: almost no changes to the table format metadata 
> beyond tracking of base files.
>
> I think [1] sounds a bit better on paper but I am worried about the 
> complexity in writers and readers (especially around keeping row groups 
> aligned and split planning). It would be great to cover this in detail in the 
> proposal.
>
> пн, 26 січ. 2026 р. о 09:00 Anurag Mantripragada 
>  пише:
>>
>> Hi all,
>>
>> "Wide tables" with thousands of columns present significant challenges for 
>> AI/ML workloads, particularly when only a subset of columns needs to be 
>> added or updated. Current Copy-on-Write (COW) and Merge-on-Read (MOR) 
>> operations in Iceberg apply at the row level, which leads to substantial 
>> write amplification in scenarios such as:
>>
>> Feature Backfilling & Column Updates: Adding new feature columns (e.g., 
>> model embeddings) to petabyte-scale tables.
>> Model Score Updates: Refresh prediction scores after retraining.
>> Embedding Refresh: Updating vector embeddings, which currently triggers a 
>> rewrite of the entire row.
>> Incremental Feature Computation: Daily updates to a small fraction of 
>> features in wide tables.
>>
>> With the Iceberg V4 proposal introducing single-file commits and column 
>> stats improvements, this is an ideal time to address column-level updates to 
>> better support these use cases.
>>
>> I have drafted a proposal that explores both table-format enhancements and 
>> file-format (Parquet) changes to enable more efficient updates.
>>
>> Proposal Details:
>> - GitHub Issue: #15146
>> - Design Document: Efficient Column Updates in Iceberg
>>
>> Next Steps:
>> I plan to create POCs to benchmark the approaches described in the document.
>>
>> Please review the proposal and share your feedback.
>>
>> Thanks,
>> Anurag


Re: [Discuss] Efficient column updates in Iceberg

2026-01-26 Thread Anton Okolnychyi
I had a chance to see the proposal before it landed and I think it is a
cool idea and both presented approaches would likely work. I am looking
forward to discussing the tradeoffs and would encourage everyone to
push/polish each approach to see what issues can be mitigated and what are
fundamental.

[1] Iceberg-native approach: better visibility into column files from the
metadata, potentially better concurrency for non-overlapping column
updates, no dep on Parquet.
[2] Parquet-native approach: almost no changes to the table format metadata
beyond tracking of base files.

I think [1] sounds a bit better on paper but I am worried about the
complexity in writers and readers (especially around keeping row groups
aligned and split planning). It would be great to cover this in detail in
the proposal.

пн, 26 січ. 2026 р. о 09:00 Anurag Mantripragada <
[email protected]> пише:

> Hi all,
>
> "Wide tables" with thousands of columns present significant challenges for
> AI/ML workloads, particularly when only a subset of columns needs to be
> added or updated. Current Copy-on-Write (COW) and Merge-on-Read (MOR)
> operations in Iceberg apply at the row level, which leads to substantial
> write amplification in scenarios such as:
>
>- Feature Backfilling & Column Updates: Adding new feature columns
>(e.g., model embeddings) to petabyte-scale tables.
>- Model Score Updates: Refresh prediction scores after retraining.
>- Embedding Refresh: Updating vector embeddings, which currently
>triggers a rewrite of the entire row.
>- Incremental Feature Computation: Daily updates to a small fraction
>of features in wide tables.
>
> With the Iceberg V4 proposal introducing single-file commits and column
> stats improvements, this is an ideal time to address column-level updates
> to better support these use cases.
>
> I have drafted a proposal that explores both table-format enhancements and
> file-format (Parquet) changes to enable more efficient updates.
>
> Proposal Details:
> - GitHub Issue: #15146 
> - Design Document: Efficient Column Updates in Iceberg
> 
>
> Next Steps:
> I plan to create POCs to benchmark the approaches described in the
> document.
>
> Please review the proposal and share your feedback.
>
> Thanks,
> Anurag
>