> > If this is correct, it aligns well with the current proposal and shouldn't > introduce any additional complexity. I will add it to the discussion points > for tomorrow's community sync.
Yes, this example aligns with what I was thinking (nit: "range" probably wouldn't be a string but I assume this was just for illustrative purposes) On the other hand, in the column family use case, splitting columns is a > strict requirement for performance. I haven’t considered how this would > work, but perhaps we could introduce a table property for column families > to make this explicit, and compaction jobs would have to respect Yeah, I don't want to get into the exact mechanics for column families. I was just calling out that compaction to the base file is not desirable in all cases, so shouldn't be assumed as a solution for small files. Thanks, Micah On Tue, Mar 3, 2026 at 3:11 PM Anurag Mantripragada < [email protected]> wrote: > Hi Micah, > > Could you expand on the complexity you think this introduces (or more >> specifically "significant" part)? > > I may have misunderstood your approach regarding packing row ranges. To > clarify, is the following what you had in mind? > > Initially, we have base_file_1.parquet (rows 1-1000) and > base_file_2.parquet (rows 1001-2000). If we update the "score" column > across both files and pack those updates into a single larger file, > packed_col_A.parquet, would the metadata structure look like this? > > { "data_file_path": "base_file_1.parquet", "column_updates": [ { > "field_id": 12, "update_file_path": "packed_col_A.parquet", > "row_range": "0-1000" } ] }, { "data_file_path": > "base_file_2.parquet", "column_updates": [ { "field_id": 12, > "update_file_path": "packed_col_A.parquet", "row_range": > "1001-2000" } ] } > > > If this is correct, it aligns well with the current proposal and shouldn't > introduce any additional complexity. I will add it to the discussion points > for tomorrow's community sync. > > > This seems at odds with supporting column families in the future? > > In my opinion, there’s a distinction between the use cases of column > updates and column families. Column updates are designed for fast writes > while maintaining reasonable read performance. Compaction is desirable to > reduce the complexity of the read side, if any. On the other hand, in the > column family use case, splitting columns is a strict requirement for > performance. I haven’t considered how this would work, but perhaps we could > introduce a table property for column families to make this explicit, and > compaction jobs would have to respect it. > > ~Anurag > > On Tue, Mar 3, 2026 at 12:02 PM Micah Kornfield <[email protected]> > wrote: > >> Hi Anurag, >> >>> *Compaction and small files*: If I understand the row ranges idea >>> correctly, packing multiple updates into larger column files would require >>> matching ranges to base files based on predicates, which adds significant >>> planning complexity. Regular compaction, which rewrites column files into >>> the base file seems more practical. >> >> >> Could you expand on the complexity you think this introduces (or more >> specifically "significant" part)? In this case the predicate should be >> pretty simple (i.e. read rows between X and Y only) and can be done >> efficiently via row group statistics. Smart writers could even partition >> rows for a specific base file into their own row group/pages to make the >> filter trivial. >> >> Regular compaction, which rewrites column files into the base file seems >>> more practical. >> >> >> This seems at odds with supporting column families in the future? >> >> Thanks, >> Micah >> >> >> On Tue, Mar 3, 2026 at 11:43 AM Anurag Mantripragada < >> [email protected]> wrote: >> >>> Hi all, >>> >>> Sorry for the delayed response. I was on vacation and catching up. >>> Thanks for the continued discussion on this topic. >>> >>> *Partial updates*: I agree that MoR-style row-level updates offer >>> limited benefits beyond reducing the writing of irrelevant columns. For use >>> cases like updating a subset of users, existing deletion vectors and the >>> new V4 manifest delete vectors should perform well. Gabor’s suggestion for >>> file-level partial updates is a reasonable alternative, even with some >>> write amplification. >>> >>> *Compaction and small files*: If I understand the row ranges idea >>> correctly, packing multiple updates into larger column files would require >>> matching ranges to base files based on predicates, which adds significant >>> planning complexity. Regular compaction, which rewrites column files into >>> the base file seems more practical. >>> >>> *Column families*: While splitting columns into families is useful, the >>> current design is more generic and already supports packing families into >>> column files. Deciding how to group these columns (manually or via an >>> engine) can be addressed in separate follow-up work. >>> >>> *Next steps:* >>> >>> - Gabor and I are developing a POC for metadata changes, focusing on >>> reading and writing column files using Spark for integration. We will >>> share >>> more details soon. >>> - I will update the doc in preparation for tomorrow's sync. >>> >>> >>> As a reminder we have a sync on column updates upcoming >>> >>> Efficient column updates sync >>> Wednesday, March 4 · 9:00 – 10:00am >>> Time zone: America/Los_Angeles >>> Google Meet joining info >>> Video call link: https://meet.google.com/naf-tvvn-qup >>> >>> ~ Anurag >>> >>> On Wed, Feb 25, 2026 at 1:32 PM Gábor Kaszab <[email protected]> >>> wrote: >>> >>>> Hey All, >>>> >>>> Nice to see the activity on this thread. Thanks to everyone who chimed >>>> in! >>>> >>>> Micah, I also feel that 1) (full column updates) and 2) (partial but >>>> file-level column updates) could be a good middle ground between perf >>>> improvement and keeping the code complexity low. In fact I had the chance >>>> to experiment in this area and the metadata + API part would be as simple >>>> as in this PoC <https://github.com/apache/iceberg/pull/15445>. Just a >>>> side note for 3), from the SQL aspect I'm a bit hesitant how >>>> straightforward it is for the users to write predicates that align with >>>> file boundaries, though. >>>> For deciding on partial column updates, we probably can't get away >>>> without doing some measurements of how it compares to existing MoR. I have >>>> it on my roadmap, so I'll share it once I have something. >>>> >>>> Wrapping multiple update files into one is an interesting idea. Let's >>>> bring this up on the next sync! Additionally, full column updates could add >>>> a huge overhead on the metadata files being created too (delete everything >>>> + write everything with updates), unless we decide to do some manifest >>>> rewrites/optimizations under the hood during the commit. >>>> >>>> Peter, column families as a schema-like table metadata level >>>> information would definitely be useful. It seems like a natural follow-up >>>> of the column update work, but we have to keep in mind to choose a design >>>> that won't prevent us from implementing a more general column families >>>> concept (probably for inserts too). >>>> >>>> Best Regards, >>>> Gabor >>>> >>>> Micah Kornfield <[email protected]> ezt írta (időpont: 2026. febr. >>>> 21., Szo, 17:53): >>>> >>>>> 1) and 3) are what I was thinking of as use-cases. I agree unless >>>>> there is a strong motivating use-case for MoR style column updates we >>>>> should try to avoid this complexity and use the existing row based MoR. >>>>> >>>>> One other idea I was trying to think through is the "small file >>>>> problem" we would likely encounter for single column additions/updates for >>>>> fixed width data. Would it make sense to add a record-range into the >>>>> metadata for column families, so that we can pack column updates across >>>>> files into reasonably sized files (similar to what we do for DVs today in >>>>> puffin files)? >>>>> >>>>> Thanks, >>>>> Micah >>>>> >>>>> On Mon, Feb 16, 2026 at 7:23 AM Gábor Kaszab <[email protected]> >>>>> wrote: >>>>> >>>>>> Hey All, >>>>>> >>>>>> Thanks Anurag for the summary! >>>>>> >>>>>> I regret we don't have a recording for the sync, but I had the >>>>>> impression that, even though there was a lengthy discussion about the >>>>>> implementation requirements for partial updates, there wasn't a strong >>>>>> consensus around the need and there were no strong use cases to justify >>>>>> partial updates either. Let me sum up where I see we are at now: >>>>>> >>>>>> *Scope of the updates* >>>>>> >>>>>> *1) Full column updates* >>>>>> There is a consensus and common understanding that this use case >>>>>> makes sense. If this was the only supported use-case, the implementation >>>>>> would be relatively simple. We could guarantee there is no overlap in >>>>>> column updates by deduplicating the field IDs in the column update >>>>>> metadata. E.g. Let's say we have a column update on columns {1,2} and we >>>>>> write another column update for {2,3}: we can change the metadata for the >>>>>> first one to only cover {1} and not {1,2}. With this the write and the >>>>>> read/stitching process is also straightforward (if we decide not to >>>>>> support >>>>>> equality deletes together with column updates). >>>>>> >>>>>> Both row matching approaches could work here: >>>>>> - row number matching update files, where we fill the deleted >>>>>> rows with an arbitrary value (preferably null) >>>>>> - sparse update files with some auxiliary column written into the >>>>>> column update file, like row position in base file >>>>>> >>>>>> *2) Partial column updates (row-level)* >>>>>> I see 2 use cases mentioned for this: bug-fixing a subset of rows, >>>>>> updating features for active users >>>>>> My initial impression here is that whether to use column updates or >>>>>> not heavily depends on the selectivity of the partial update queries. I'm >>>>>> sure there is a percentage of the affected rows where if we go below it's >>>>>> simply better to use the traditional row level updates (cow/mor). I'm not >>>>>> entirely convinced that covering these scenarios is worth the extra >>>>>> complexity here: >>>>>> - We can't deduplicate the column updates by field IDs on the >>>>>> metadata-side >>>>>> - We have two options for writers: >>>>>> - Merge the existing column update files themselves when >>>>>> writing a new one with an overlap of field Ids. No need to sort out the >>>>>> different column updates files and merge them on the read side, but there >>>>>> is overhead on write side >>>>>> - Don't bother merging existing column updates when writing a >>>>>> new one. This makes overhead on the read side. >>>>>> >>>>>> Handling of sparse update files is a must here, with the chance for >>>>>> optimisation if all the rows are covered with the update file, as Micah >>>>>> suggested. >>>>>> >>>>>> To sum up, I think to justify this approach we need to have strong >>>>>> use-cases and measurements to verify that the extra complexity results >>>>>> convincingly better results compared to existing CoW/MoR approaches. >>>>>> >>>>>> *3) Partial column updates (file-level)* >>>>>> This option wasn't brought up during our conversation but might be >>>>>> worth considering. This is basically a middleground between the above two >>>>>> approaches. Partial updates are allowed as long as they affect entire >>>>>> data >>>>>> files, and it's allowed to only cover a subset of the files. One use-case >>>>>> would be to do column updates per partition for instance. >>>>>> >>>>>> With this approach the metadata representation could be as simple as >>>>>> in 1), where we can deduplicate the updates files by field IDs. Also >>>>>> there >>>>>> is no write and read overhead on top of 1) apart from the verification >>>>>> step >>>>>> to ensure that the WHERE filter on the update is doing the split on file >>>>>> boundaries. >>>>>> Also similarly to 1), sparse update files weren't a must here, we >>>>>> could consider row-matching update files too. >>>>>> >>>>>> *Row alignment* >>>>>> Sparse update files are required for row-level partial updates, but >>>>>> if we decide to go with any of the other options we could also evaluate >>>>>> the >>>>>> "row count matching" approach too. Even though it requires filling the >>>>>> missing rows with arbitrary values (null seems a good candidate) it would >>>>>> result in less write overhead (no need to write row position) and read >>>>>> overhead (no need to join rows by row position) too that could worth the >>>>>> inconvenience of having 'invalid' but inaccessible values in the files. >>>>>> The >>>>>> num nulls stats being off is a good argument against this, but I think we >>>>>> could have a way of fixing this too by keeping track of how many rows >>>>>> were >>>>>> deleted (and subtract this value from the num nulls counter returned by >>>>>> the >>>>>> writer). >>>>>> >>>>>> >>>>>> *Next steps* >>>>>> I'm actively working on a very basic PoC implementation where we >>>>>> would be able to test the different approaches comparing pros and cons so >>>>>> that we can make a decision on the above questions. I'll sync with Anurag >>>>>> on this and will let you know once we have something. >>>>>> >>>>>> Best Regards, >>>>>> Gabor >>>>>> >>>>>> >>>>>> Micah Kornfield <[email protected]> ezt írta (időpont: 2026. >>>>>> febr. 14., Szo, 2:20): >>>>>> >>>>>>> Given that, the sparse representation with alignment at read time >>>>>>>> (using dummy/null values) seems to provide the benefits of both >>>>>>>> efficient >>>>>>>> vectorized reads and stitching as well as support for partial column >>>>>>>> updates. Would you agree? >>>>>>> >>>>>>> >>>>>>> Thinking more about it, I think the sparse approach is actually a >>>>>>> superset set approach, so it is not a concern. If writers want they can >>>>>>> write out the fully populated columns with position indexes from 1 to N, >>>>>>> and readers can take an optimized path if they detect the number of >>>>>>> rows in >>>>>>> the update is equal to the number of base rows. >>>>>>> >>>>>>> I still think there is a question on what writers should do (i.e. >>>>>>> when do they decide to duplicate data instead of trying to give sparse >>>>>>> updates) but that is an implementation question and not necessarily >>>>>>> something that needs to block spec work. >>>>>>> >>>>>>> Cheers, >>>>>>> Micah >>>>>>> >>>>>>> On Fri, Feb 13, 2026 at 11:29 AM Anurag Mantripragada < >>>>>>> [email protected]> wrote: >>>>>>> >>>>>>>> Hi Micah, >>>>>>>> >>>>>>>> This seems like a classic MoR vs CoW trade-off. But it seems like >>>>>>>>> maybe both sparse and full should be available (I understand this adds >>>>>>>>> complexity). For adding a new column or completely updating a new >>>>>>>>> column, >>>>>>>>> the performance would be better to prefill the data >>>>>>>> >>>>>>>> >>>>>>>> Our internal use cases are very similar to what you describe. We >>>>>>>> primarily deal with full column updates. However, the feedback on the >>>>>>>> proposal from the wider community indicated that partial updates (e.g., >>>>>>>> bug-fixing a subset of rows, updating features for active users) are >>>>>>>> also a >>>>>>>> very common and critical use case. >>>>>>>> >>>>>>>> Is there evidence to say that partial column updates are more >>>>>>>>> common in practice then full rewrites? >>>>>>>> >>>>>>>> >>>>>>>> Personally, I don't have hard data on which use case is more common >>>>>>>> in the wild, only that both appear to be important. I also agree that a >>>>>>>> good long term solution should support both strategies. Given that, the >>>>>>>> sparse representation with alignment at read time (using dummy/null >>>>>>>> values) seems to provide the benefits of both efficient vectorized >>>>>>>> reads >>>>>>>> and stitching as well as support for partial column updates. Would you >>>>>>>> agree? >>>>>>>> >>>>>>>> ~ Anurag >>>>>>>> >>>>>>>> On Fri, Feb 13, 2026 at 9:33 AM Micah Kornfield < >>>>>>>> [email protected]> wrote: >>>>>>>> >>>>>>>>> Hi Anurag, >>>>>>>>> >>>>>>>>>> Data Representation: Sparse column files are preferred for >>>>>>>>>> compact representation and are better suited for partial column >>>>>>>>>> updates. We >>>>>>>>>> can optimize sparse representation for vectorized reads by >>>>>>>>>> filling in null or default values at read time for missing positions >>>>>>>>>> from >>>>>>>>>> the base file, which avoids joins during reads. >>>>>>>>> >>>>>>>>> >>>>>>>>> This seems like a classic MoR vs CoW trade-off. But it seems like >>>>>>>>> maybe both sparse and full should be available (I understand this adds >>>>>>>>> complexity). For adding a new column or completely updating a new >>>>>>>>> column, >>>>>>>>> the performance would be better to prefill the data (otherwise one >>>>>>>>> ends up >>>>>>>>> duplicating the work that is already happening under the hood in >>>>>>>>> parquet). >>>>>>>>> >>>>>>>>> Is there evidence to say that partial column updates are more >>>>>>>>> common in practice then full rewrites? >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> Micah >>>>>>>>> >>>>>>>>> >>>>>>>>> On Thu, Feb 12, 2026 at 3:32 AM Eduard Tudenhöfner < >>>>>>>>> [email protected]> wrote: >>>>>>>>> >>>>>>>>>> Hey Anurag, >>>>>>>>>> >>>>>>>>>> I wasn't able to make it to the sync but was hoping to watch the >>>>>>>>>> recording afterwards. >>>>>>>>>> I'm curious what the reasons were for discarding the >>>>>>>>>> Parquet-native approach. Could you share a summary from what was >>>>>>>>>> discussed >>>>>>>>>> in the sync please on that topic? >>>>>>>>>> >>>>>>>>>> On Tue, Feb 10, 2026 at 8:20 PM Anurag Mantripragada < >>>>>>>>>> [email protected]> wrote: >>>>>>>>>> >>>>>>>>>>> Hi all, >>>>>>>>>>> >>>>>>>>>>> Thank you for attending today's sync. Please find the meeting >>>>>>>>>>> notes below. I apologize that we were unable to record the session >>>>>>>>>>> due to >>>>>>>>>>> attendees not having record access. >>>>>>>>>>> >>>>>>>>>>> Key updates and discussion points: >>>>>>>>>>> >>>>>>>>>>> *Decisions:* >>>>>>>>>>> >>>>>>>>>>> - Table Format vs. Parquet: There is a general consensus >>>>>>>>>>> that column update support should reside in the table format. >>>>>>>>>>> Consequently, >>>>>>>>>>> we have discarded the Parquet-native approach. >>>>>>>>>>> - Metadata Representation: To maintain clean metadata and >>>>>>>>>>> avoid complex resolution logic for readers, the goal is to keep >>>>>>>>>>> only one >>>>>>>>>>> metadata file per column. However, achieving this is challenging >>>>>>>>>>> if we >>>>>>>>>>> support partial updates, as multiple column files may exist for >>>>>>>>>>> the same >>>>>>>>>>> column (See open questions). >>>>>>>>>>> - Data Representation: Sparse column files are preferred for >>>>>>>>>>> compact representation and are better suited for partial column >>>>>>>>>>> updates. We >>>>>>>>>>> can optimize sparse representation for vectorized reads by >>>>>>>>>>> filling in null >>>>>>>>>>> or default values at read time for missing positions from the >>>>>>>>>>> base file, >>>>>>>>>>> which avoids joins during reads. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> *Open Questions: * >>>>>>>>>>> >>>>>>>>>>> - We are still determining what restrictions are necessary >>>>>>>>>>> when supporting partial updates. For instance, we need to decide >>>>>>>>>>> whether to >>>>>>>>>>> add a new column and subsequently allow partial updates on it. >>>>>>>>>>> This would >>>>>>>>>>> involve managing both a base column file and subsequent update >>>>>>>>>>> files. >>>>>>>>>>> - We need a better understanding of the use cases for >>>>>>>>>>> partial updates. >>>>>>>>>>> - We need to further discuss the handling of equality >>>>>>>>>>> deletes. >>>>>>>>>>> >>>>>>>>>>> If I missed anything, or if others took notes, please share them >>>>>>>>>>> here. Thanks! >>>>>>>>>>> >>>>>>>>>>> I will go ahead and update the doc with what we have discussed >>>>>>>>>>> so we can continue next time from where we left off. >>>>>>>>>>> >>>>>>>>>>> ~ Anurag >>>>>>>>>>> >>>>>>>>>>> On Mon, Feb 9, 2026 at 11:55 AM Anurag Mantripragada < >>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>> >>>>>>>>>>>> Hi all, >>>>>>>>>>>> >>>>>>>>>>>> This design >>>>>>>>>>>> <https://docs.google.com/document/d/1Bd7JVzgajA8-DozzeEE24mID_GLuz6iwj0g4TlcVJcs/edit?tab=t.0> >>>>>>>>>>>> will be discussed tomorrow in a dedicated sync. >>>>>>>>>>>> >>>>>>>>>>>> Efficient column updates sync >>>>>>>>>>>> Tuesday, February 10 · 9:00 – 10:00am >>>>>>>>>>>> Time zone: America/Los_Angeles >>>>>>>>>>>> Google Meet joining info >>>>>>>>>>>> Video call link: https://meet.google.com/xsd-exug-tcd >>>>>>>>>>>> >>>>>>>>>>>> ~ Anurag >>>>>>>>>>>> >>>>>>>>>>>> On Fri, Feb 6, 2026 at 8:30 AM Anurag Mantripragada < >>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Hi Gabor, >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks for the detailed example. >>>>>>>>>>>>> >>>>>>>>>>>>> I agree with Steven that Option 2 seems reasonable. I will add >>>>>>>>>>>>> a section to the design doc regarding equality delete handling, >>>>>>>>>>>>> and we can >>>>>>>>>>>>> discuss this further during our meeting on Tuesday. >>>>>>>>>>>>> >>>>>>>>>>>>> ~Anurag >>>>>>>>>>>>> >>>>>>>>>>>>> On Fri, Feb 6, 2026 at 7:08 AM Steven Wu <[email protected]> >>>>>>>>>>>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> > 1) When deleting with eq-deletes: If there is a column >>>>>>>>>>>>>> update on the equality-filed ID we use for the delete, reject >>>>>>>>>>>>>> deletion >>>>>>>>>>>>>> > 2) When adding a column update on a column that is part of >>>>>>>>>>>>>> the equality field IDs in some delete, we reject the column >>>>>>>>>>>>>> update >>>>>>>>>>>>>> >>>>>>>>>>>>>> Gabor, this is a good scenario. The 2nd option makes sense to >>>>>>>>>>>>>> me, since equality ids are like primary key fields. If we have >>>>>>>>>>>>>> the 2nd rule >>>>>>>>>>>>>> enforced, the first option is not applicable anymore. >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Fri, Feb 6, 2026 at 3:13 AM Gábor Kaszab < >>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Hey, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Thank you for the proposal, Anurag! I made a pass recently >>>>>>>>>>>>>>> and I think there is some interference between column updates >>>>>>>>>>>>>>> and equality >>>>>>>>>>>>>>> deletes. Let me describe below: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Steps: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> CREATE TABLE tbl (int a, int b); >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> INSERT INTO tbl VALUES (1, 11), (2, 22); -- creates the >>>>>>>>>>>>>>> base data file >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> DELETE FROM tbl WHERE b=11; -- creates an >>>>>>>>>>>>>>> equality delete file >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> UPDATE tbl SET b=11; -- writes >>>>>>>>>>>>>>> column update >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> SELECT * FROM tbl; >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Expected result: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> (2, 11) >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Data and metadata created after the above steps: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Base file >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> (1, 11), (2, 22), >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> seqnum=1 >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> EQ-delete >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> b=11 >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> seqnum=2 >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Column update >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Field ids: [field_id_for_col_b] >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> seqnum=3 >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Data file content: (dummy_value),(11) >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Read steps: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> 1. Stitch base file with column updates in reader: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Rows: (1,dummy_value), (2,11) (Note, dummy value can >>>>>>>>>>>>>>> be either null, or 11, see the proposal for more details) >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Seqnum for base file=1 >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Seqnum for column update=3 >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> 2. Apply eq-delete b=11, seqnum=3 on the stitched result >>>>>>>>>>>>>>> 3. Query result depends on which seqnum we carry forward >>>>>>>>>>>>>>> to compare with the eq-delete's seqnum, but it's not correct >>>>>>>>>>>>>>> in any of the >>>>>>>>>>>>>>> cases >>>>>>>>>>>>>>> 1. Use seqnum from base file: we get either an empty >>>>>>>>>>>>>>> result if 'dummy_value' is 11 or we get (1, null) >>>>>>>>>>>>>>> otherwise >>>>>>>>>>>>>>> 2. Use seqnum from last update file: don't delete any >>>>>>>>>>>>>>> rows, result set is (1, dummy_value),(2,11) >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Problem: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> EQ-delete should be applied midway applying the column >>>>>>>>>>>>>>> updates to the base file based on sequence number, during the >>>>>>>>>>>>>>> stitching >>>>>>>>>>>>>>> process. If I'm not mistaken, this is not feasible with the way >>>>>>>>>>>>>>> readers >>>>>>>>>>>>>>> work. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Proposal: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Don't allow equality deletes together with column updates. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> 1) When deleting with eq-deletes: If there is a column >>>>>>>>>>>>>>> update on the equality-filed ID we use for the delete, reject >>>>>>>>>>>>>>> deletion >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> 2) When adding a column update on a column that is part of >>>>>>>>>>>>>>> the equality field IDs in some delete, we reject the column >>>>>>>>>>>>>>> update >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Alternatively, column updates could be controlled by a >>>>>>>>>>>>>>> property of the table (immutable), and reject eq-deletes if the >>>>>>>>>>>>>>> property >>>>>>>>>>>>>>> indicates column updates are turned on for the table >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Let me know what you think! >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Best Regards, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Gabor >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Anurag Mantripragada <[email protected]> ezt írta >>>>>>>>>>>>>>> (időpont: 2026. jan. 28., Sze, 3:31): >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Thank you everyone for the initial review comments. It is >>>>>>>>>>>>>>>> exciting to see so much interest in this proposal. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I am currently reviewing and responding to each comment. >>>>>>>>>>>>>>>> The general themes of the feedback so far include: >>>>>>>>>>>>>>>> - Including partial updates (column updates on a subset of >>>>>>>>>>>>>>>> rows in a table). >>>>>>>>>>>>>>>> - Adding details on how SQL engines will write the update >>>>>>>>>>>>>>>> files. >>>>>>>>>>>>>>>> - Adding details on split planning and row alignment for >>>>>>>>>>>>>>>> update files. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I will think through these points and update the design >>>>>>>>>>>>>>>> accordingly. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Best >>>>>>>>>>>>>>>> Anurag >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Tue, Jan 27, 2026 at 6:25 PM Anurag Mantripragada < >>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Hi Xiangin, >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Happy to learn from your experience in supporting >>>>>>>>>>>>>>>>> backfill use-cases. Please feel free to review the proposal >>>>>>>>>>>>>>>>> and add your >>>>>>>>>>>>>>>>> comments. I will wait for a couple of days more to ensure >>>>>>>>>>>>>>>>> everyone has a >>>>>>>>>>>>>>>>> chance to review the proposal. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> ~ Anurag >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On Tue, Jan 27, 2026 at 6:42 AM Xianjin Ye < >>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Hi Anurag and Peter, >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> It’s great to see the partial column update has gained >>>>>>>>>>>>>>>>>> great interest in the community. I internally built a >>>>>>>>>>>>>>>>>> BackfillColumns >>>>>>>>>>>>>>>>>> action to efficiently backfill columns(by writing the >>>>>>>>>>>>>>>>>> partial columns only >>>>>>>>>>>>>>>>>> and copies the binary data of other columns into a new >>>>>>>>>>>>>>>>>> DataFile). The >>>>>>>>>>>>>>>>>> speedup could be 10x for wide tables but the write >>>>>>>>>>>>>>>>>> amplification is still >>>>>>>>>>>>>>>>>> there. I would be happy to collaborate on the work and >>>>>>>>>>>>>>>>>> eliminate the write >>>>>>>>>>>>>>>>>> amplification. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> On 2026/01/27 10:12:54 Péter Váry wrote: >>>>>>>>>>>>>>>>>> > Hi Anurag, >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> > It’s great to see how much interest there is in the >>>>>>>>>>>>>>>>>> community around this >>>>>>>>>>>>>>>>>> > potential new feature. Gábor and I have actually >>>>>>>>>>>>>>>>>> submitted an Iceberg >>>>>>>>>>>>>>>>>> > Summit talk proposal on this topic, and we would be >>>>>>>>>>>>>>>>>> very happy to >>>>>>>>>>>>>>>>>> > collaborate on the work. I was mainly waiting for the >>>>>>>>>>>>>>>>>> File Format API to be >>>>>>>>>>>>>>>>>> > finalized, as I believe this feature should build on >>>>>>>>>>>>>>>>>> top of it. >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> > For reference, our related work includes: >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> > - *Dev list thread:* >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> https://lists.apache.org/thread/h0941sdq9jwrb6sj0pjfjjxov8tx7ov9 >>>>>>>>>>>>>>>>>> > - *Proposal document:* >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> https://docs.google.com/document/d/1OHuZ6RyzZvCOQ6UQoV84GzwVp3UPiu_cfXClsOi03ww >>>>>>>>>>>>>>>>>> > (not shared widely yet) >>>>>>>>>>>>>>>>>> > - *Performance testing PR for readers and writers:* >>>>>>>>>>>>>>>>>> > https://github.com/apache/iceberg/pull/13306 >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> > During earlier discussions about possible metadata >>>>>>>>>>>>>>>>>> changes, another option >>>>>>>>>>>>>>>>>> > came up that hasn’t been documented yet: separating >>>>>>>>>>>>>>>>>> planner metadata from >>>>>>>>>>>>>>>>>> > reader metadata. Since the planner does not need to >>>>>>>>>>>>>>>>>> know about the actual >>>>>>>>>>>>>>>>>> > files, we could store the file composition in a >>>>>>>>>>>>>>>>>> separate file (potentially >>>>>>>>>>>>>>>>>> > a Puffin file). This file could hold the column_files >>>>>>>>>>>>>>>>>> metadata, while the >>>>>>>>>>>>>>>>>> > manifest would reference the Puffin file and blob >>>>>>>>>>>>>>>>>> position instead of the >>>>>>>>>>>>>>>>>> > data filename. >>>>>>>>>>>>>>>>>> > This approach has the advantage of keeping the existing >>>>>>>>>>>>>>>>>> metadata largely >>>>>>>>>>>>>>>>>> > intact, and it could also give us a natural place later >>>>>>>>>>>>>>>>>> to add file-level >>>>>>>>>>>>>>>>>> > indexes or Bloom filters for use during reads or >>>>>>>>>>>>>>>>>> secondary filtering. The >>>>>>>>>>>>>>>>>> > downsides are the additional files and the increased >>>>>>>>>>>>>>>>>> complexity of >>>>>>>>>>>>>>>>>> > identifying files that are no longer referenced by the >>>>>>>>>>>>>>>>>> table, so this may >>>>>>>>>>>>>>>>>> > not be an ideal solution. >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> > I do have some concerns about the MoR metadata proposal >>>>>>>>>>>>>>>>>> described in the >>>>>>>>>>>>>>>>>> > document. At first glance, it seems to complicate >>>>>>>>>>>>>>>>>> distributed planning, as >>>>>>>>>>>>>>>>>> > all entries for a given file would need to be collected >>>>>>>>>>>>>>>>>> and merged to >>>>>>>>>>>>>>>>>> > provide the information required by both the planner >>>>>>>>>>>>>>>>>> and the reader. >>>>>>>>>>>>>>>>>> > Additionally, when a new column is added or updated, we >>>>>>>>>>>>>>>>>> would still need to >>>>>>>>>>>>>>>>>> > add a new metadata entry for every existing data file. >>>>>>>>>>>>>>>>>> If we immediately >>>>>>>>>>>>>>>>>> > write out the merged metadata, the total number of >>>>>>>>>>>>>>>>>> entries remains the >>>>>>>>>>>>>>>>>> > same. The main benefit is avoiding rewriting >>>>>>>>>>>>>>>>>> statistics, which can be >>>>>>>>>>>>>>>>>> > significant, but this comes at the cost of increased >>>>>>>>>>>>>>>>>> planning complexity. >>>>>>>>>>>>>>>>>> > If we choose to store the merged statistics in the >>>>>>>>>>>>>>>>>> column_families entry, I >>>>>>>>>>>>>>>>>> > don’t see much benefit in excluding the rest of the >>>>>>>>>>>>>>>>>> metadata, especially >>>>>>>>>>>>>>>>>> > since including it would simplify the planning process. >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> > As Anton already pointed out, we should also discuss >>>>>>>>>>>>>>>>>> how this change would >>>>>>>>>>>>>>>>>> > affect split handling, particularly how to avoid double >>>>>>>>>>>>>>>>>> reads when row >>>>>>>>>>>>>>>>>> > groups are not aligned between the original data files >>>>>>>>>>>>>>>>>> and the new column >>>>>>>>>>>>>>>>>> > files. >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> > Finally, I’d like to see some discussion around the >>>>>>>>>>>>>>>>>> Java API implications. >>>>>>>>>>>>>>>>>> > In particular, what API changes are required, and how >>>>>>>>>>>>>>>>>> SQL engines would >>>>>>>>>>>>>>>>>> > perform updates. Since the new column files must have >>>>>>>>>>>>>>>>>> the same number of >>>>>>>>>>>>>>>>>> > rows as the original data files, with a strict >>>>>>>>>>>>>>>>>> one-to-one relationship, SQL >>>>>>>>>>>>>>>>>> > engines would need access to the source filename, >>>>>>>>>>>>>>>>>> position, and deletion >>>>>>>>>>>>>>>>>> > status in the DataFrame in order to generate the new >>>>>>>>>>>>>>>>>> files. This is more >>>>>>>>>>>>>>>>>> > involved than a simple update and deserves some >>>>>>>>>>>>>>>>>> explicit consideration. >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> > Looking forward to your thoughts. >>>>>>>>>>>>>>>>>> > Best regards, >>>>>>>>>>>>>>>>>> > Peter >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> > On Tue, Jan 27, 2026, 03:58 Anurag Mantripragada < >>>>>>>>>>>>>>>>>> [email protected]> >>>>>>>>>>>>>>>>>> > wrote: >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> > > Thanks Anton and others, for providing some initial >>>>>>>>>>>>>>>>>> feedback. I will >>>>>>>>>>>>>>>>>> > > address all your comments soon. >>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>> > > On Mon, Jan 26, 2026 at 11:10 AM Anton Okolnychyi < >>>>>>>>>>>>>>>>>> [email protected]> >>>>>>>>>>>>>>>>>> > > wrote: >>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>> > >> I had a chance to see the proposal before it landed >>>>>>>>>>>>>>>>>> and I think it is a >>>>>>>>>>>>>>>>>> > >> cool idea and both presented approaches would likely >>>>>>>>>>>>>>>>>> work. I am looking >>>>>>>>>>>>>>>>>> > >> forward to discussing the tradeoffs and would >>>>>>>>>>>>>>>>>> encourage everyone to >>>>>>>>>>>>>>>>>> > >> push/polish each approach to see what issues can be >>>>>>>>>>>>>>>>>> mitigated and what are >>>>>>>>>>>>>>>>>> > >> fundamental. >>>>>>>>>>>>>>>>>> > >> >>>>>>>>>>>>>>>>>> > >> [1] Iceberg-native approach: better visibility into >>>>>>>>>>>>>>>>>> column files from the >>>>>>>>>>>>>>>>>> > >> metadata, potentially better concurrency for >>>>>>>>>>>>>>>>>> non-overlapping column >>>>>>>>>>>>>>>>>> > >> updates, no dep on Parquet. >>>>>>>>>>>>>>>>>> > >> [2] Parquet-native approach: almost no changes to >>>>>>>>>>>>>>>>>> the table format >>>>>>>>>>>>>>>>>> > >> metadata beyond tracking of base files. >>>>>>>>>>>>>>>>>> > >> >>>>>>>>>>>>>>>>>> > >> I think [1] sounds a bit better on paper but I am >>>>>>>>>>>>>>>>>> worried about the >>>>>>>>>>>>>>>>>> > >> complexity in writers and readers (especially around >>>>>>>>>>>>>>>>>> keeping row groups >>>>>>>>>>>>>>>>>> > >> aligned and split planning). It would be great to >>>>>>>>>>>>>>>>>> cover this in detail in >>>>>>>>>>>>>>>>>> > >> the proposal. >>>>>>>>>>>>>>>>>> > >> >>>>>>>>>>>>>>>>>> > >> пн, 26 січ. 2026 р. о 09:00 Anurag Mantripragada < >>>>>>>>>>>>>>>>>> > >> [email protected]> пише: >>>>>>>>>>>>>>>>>> > >> >>>>>>>>>>>>>>>>>> > >>> Hi all, >>>>>>>>>>>>>>>>>> > >>> >>>>>>>>>>>>>>>>>> > >>> "Wide tables" with thousands of columns present >>>>>>>>>>>>>>>>>> significant challenges >>>>>>>>>>>>>>>>>> > >>> for AI/ML workloads, particularly when only a >>>>>>>>>>>>>>>>>> subset of columns needs to be >>>>>>>>>>>>>>>>>> > >>> added or updated. Current Copy-on-Write (COW) and >>>>>>>>>>>>>>>>>> Merge-on-Read (MOR) >>>>>>>>>>>>>>>>>> > >>> operations in Iceberg apply at the row level, which >>>>>>>>>>>>>>>>>> leads to substantial >>>>>>>>>>>>>>>>>> > >>> write amplification in scenarios such as: >>>>>>>>>>>>>>>>>> > >>> >>>>>>>>>>>>>>>>>> > >>> - Feature Backfilling & Column Updates: Adding >>>>>>>>>>>>>>>>>> new feature columns >>>>>>>>>>>>>>>>>> > >>> (e.g., model embeddings) to petabyte-scale >>>>>>>>>>>>>>>>>> tables. >>>>>>>>>>>>>>>>>> > >>> - Model Score Updates: Refresh prediction scores >>>>>>>>>>>>>>>>>> after retraining. >>>>>>>>>>>>>>>>>> > >>> - Embedding Refresh: Updating vector embeddings, >>>>>>>>>>>>>>>>>> which currently >>>>>>>>>>>>>>>>>> > >>> triggers a rewrite of the entire row. >>>>>>>>>>>>>>>>>> > >>> - Incremental Feature Computation: Daily updates >>>>>>>>>>>>>>>>>> to a small fraction >>>>>>>>>>>>>>>>>> > >>> of features in wide tables. >>>>>>>>>>>>>>>>>> > >>> >>>>>>>>>>>>>>>>>> > >>> With the Iceberg V4 proposal introducing >>>>>>>>>>>>>>>>>> single-file commits and column >>>>>>>>>>>>>>>>>> > >>> stats improvements, this is an ideal time to >>>>>>>>>>>>>>>>>> address column-level updates >>>>>>>>>>>>>>>>>> > >>> to better support these use cases. >>>>>>>>>>>>>>>>>> > >>> >>>>>>>>>>>>>>>>>> > >>> I have drafted a proposal that explores both >>>>>>>>>>>>>>>>>> table-format enhancements >>>>>>>>>>>>>>>>>> > >>> and file-format (Parquet) changes to enable more >>>>>>>>>>>>>>>>>> efficient updates. >>>>>>>>>>>>>>>>>> > >>> >>>>>>>>>>>>>>>>>> > >>> Proposal Details: >>>>>>>>>>>>>>>>>> > >>> - GitHub Issue: #15146 < >>>>>>>>>>>>>>>>>> https://github.com/apache/iceberg/issues/15146> >>>>>>>>>>>>>>>>>> > >>> - Design Document: Efficient Column Updates in >>>>>>>>>>>>>>>>>> Iceberg >>>>>>>>>>>>>>>>>> > >>> < >>>>>>>>>>>>>>>>>> https://docs.google.com/document/d/1Bd7JVzgajA8-DozzeEE24mID_GLuz6iwj0g4TlcVJcs/edit?tab=t.0 >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> > >>> >>>>>>>>>>>>>>>>>> > >>> Next Steps: >>>>>>>>>>>>>>>>>> > >>> I plan to create POCs to benchmark the approaches >>>>>>>>>>>>>>>>>> described in the >>>>>>>>>>>>>>>>>> > >>> document. >>>>>>>>>>>>>>>>>> > >>> >>>>>>>>>>>>>>>>>> > >>> Please review the proposal and share your feedback. >>>>>>>>>>>>>>>>>> > >>> >>>>>>>>>>>>>>>>>> > >>> Thanks, >>>>>>>>>>>>>>>>>> > >>> Anurag >>>>>>>>>>>>>>>>>> > >>> >>>>>>>>>>>>>>>>>> > >> >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>
