Re: [DISCUSS] v4 - One file commits

Steven Wu Mon, 30 Mar 2026 08:32:43 -0700

Amogh, can you upload the video to the YouTube channel?
https://www.youtube.com/playlist?list=PLkifVhhWtccxt1TE7w_HbNGhY5gpDTaX7


On Mon, Mar 30, 2026 at 8:28 AM Amogh Jahagirdar <[email protected]> wrote:

> Hey a few folks reached out indicating that I didn't properly share the
> last v4 metadata tree meeting recording. So sorry about that! Here's the
> link
> <https://drive.google.com/file/d/1LhDL0Iy8YR4RN_W3D8APOUtkSBYk61fD/view?usp=drive_link>
>  ,
> do let me know if there are still issues.
>
> On Tue, Mar 3, 2026 at 9:17 AM Steven Wu <[email protected]> wrote:
>
>> My takeaway from the conversation is also that we don't need row-level
>> column updates. Manifest DV can be used for row-level updates instead.
>> Basically, a file (manifest or data) can be updated via (1) delete vector +
>> updated rows in a new file (2) column file overlay. Depends on the
>> percentage of modified rows, engines can choose which way to go.
>>
>> On Tue, Mar 3, 2026 at 6:24 AM Gábor Kaszab <[email protected]>
>> wrote:
>>
>>> Thanks for the summary, Micah! I tried to watch the recording linked to
>>> the calendar event, but apparently I don't have permission to do so. Not
>>> sure about others.
>>>
>>> So if 'm not mistaken, one way to reduce the write cost of an UPDATE for
>>> colocated DVs is to use the column updates. As I see there was some
>>> agreement that row-level partial column updates aren't desired, and we aim
>>> for at least file-level column updates. This is very useful information for
>>> the other conversation
>>> <https://lists.apache.org/thread/w90rqyhmh6pb0yxp0bqzgzk1y1rotyny>
>>> going on for the column update proposal. We can bring this up on the column
>>> update sync tomorrow, but I'm wondering if the consensus on avoiding
>>> row-level column updates is something we can incorporate into the column
>>> update proposal too or if it's something still up to debate.
>>>
>>> Best Regards,
>>> Gabor
>>>
>>> Micah Kornfield <[email protected]> ezt írta (időpont: 2026. febr.
>>> 25., Sze, 22:30):
>>>
>>>> Just wanted to summarize my main takeaways of Monday's sync.
>>>>
>>>> The approach will always collocate DVs with the data files (i.e. every
>>>> data file row in a manifest has an optional DV reference).  This implies
>>>> that there is not a separate "Deletion manifest".  Rather in V4 all
>>>> manifests are "combined" where data files and DVs are colocated.
>>>>
>>>> Write amplification is avoided in two ways:
>>>> 1.  For small updates we will need to  carry through metadata
>>>> statistics (and other relevant data file fields) in memory (rescanning
>>>> these is likely two expensive).    Once updates are available they will be
>>>> written out a new manifest (either root or leaf) and use metadata DVs to
>>>> remove the old rows.
>>>> 2.  For larger updates we will only carry through the DV update parts
>>>> in memory and use column level updates to replace existing DVs (this would
>>>> require rescanning the DV columns for any updated manifest to merge with
>>>> the updated DVs in memory, and then writing out the column update). The
>>>> consensus on the call is that we didn't want to support partial  column
>>>> updates (a.k.a. merge-on-read column updates).
>>>>
>>>> The idea is that engines would decide which path to follow based on the
>>>> number of affected files.
>>>>
>>>> To help understand the implications of the new proposal, I put together
>>>> a quick spreadsheet [1] to analyze trade-offs between separate deletion
>>>> manifests and the new approach under scenario 1 and 2.  This represents the
>>>> worst case scenario where file updates are uniformly distributed across a
>>>> single update operation.  It does not account for repeated writes (e.g.
>>>> on-going compaction).  My main take-aways is that keeping at most 1
>>>> affiliated DV separate might still help (akin to a merge on read column
>>>> update), but maybe not enough relative to other parts of the system (e.g.
>>>> the churn on data files) that the complexity.
>>>>
>>>> Hope this is helpful.
>>>>
>>>> Micah
>>>>
>>>> [1]
>>>> https://docs.google.com/spreadsheets/d/1klZQxV7ST2C-p9LTMmai_5rtFiyupj6jSLRPRkdI-u8/edit?gid=0#gid=0
>>>>
>>>>
>>>>
>>>> On Thu, Feb 19, 2026 at 3:52 PM Amogh Jahagirdar <[email protected]>
>>>> wrote:
>>>>
>>>>> Hey folks, I've set up an additional initial discussion on DVs for
>>>>> Monday. This topic is fairly complex and there is also now a free calendar
>>>>> slot. I think it'd be helpful for us to first make sure we're all on the
>>>>> same page in terms of what the approach proposed by Anton earlier in the
>>>>> thread means and the high level mechanics. I should also have more to 
>>>>> share
>>>>> on the doc about how the entry structure and change detection could look
>>>>> like in this approach. Then on Thursday we can get into more details and
>>>>> targeted points of discussion on this topic.
>>>>>
>>>>> Thanks,
>>>>> Amogh Jahagirdar
>>>>>
>>>>> On Tue, Feb 17, 2026 at 9:27 PM Amogh Jahagirdar <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> Thanks Steven! I've set up some time next Thursday for the community
>>>>>> to discuss this. We're also looking at how the content entry would look
>>>>>> like in a combined DV with potential column updates for DV changes, and 
>>>>>> how
>>>>>> change detection could look like in this approach. I should have more to
>>>>>> share on this by the time of the community discussion next week.
>>>>>> We should also consider potential root churn and memory consumption
>>>>>> stemming from expected root entry inflation due to a combined data file +
>>>>>> DV entry with possible column updates for certain DV workloads; though at
>>>>>> least for memory consumption of stats being held after planning, that
>>>>>> arguably is an implementation problem for certain integrations.
>>>>>>
>>>>>> Thanks,
>>>>>> Amogh Jahagirdar
>>>>>>
>>>>>> On Fri, Feb 13, 2026 at 10:58 AM Steven Wu <[email protected]>
>>>>>> wrote:
>>>>>>
>>>>>>> I wrote up some analysis with back-of-the-envelope calculations
>>>>>>> about the column update approach for DV colocation. It mainly concerns 
>>>>>>> the
>>>>>>> 2nd use case: deleting a large number of rows from a small number of 
>>>>>>> files.
>>>>>>>
>>>>>>>
>>>>>>> https://docs.google.com/document/d/1k4x8utgh41Sn1tr98eynDKCWq035SV_f75rtNHcerVw/edit?tab=t.gvdulzy486n7
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Feb 4, 2026 at 1:02 AM Péter Váry <
>>>>>>> [email protected]> wrote:
>>>>>>>
>>>>>>>> I fully agree with Anton and Steven that we need benchmarks before
>>>>>>>> choosing any direction.
>>>>>>>>
>>>>>>>> I ran some preliminary column‑stitching benchmarks last summer:
>>>>>>>>
>>>>>>>>    - Results are available in the doc:
>>>>>>>>    
>>>>>>>> https://docs.google.com/document/d/1OHuZ6RyzZvCOQ6UQoV84GzwVp3UPiu_cfXClsOi03ww
>>>>>>>>    - Code is here: https://github.com/apache/iceberg/pull/13306
>>>>>>>>
>>>>>>>> I’ve summarized the most relevant results at the end of this email.
>>>>>>>> They show roughly a 10% slowdown on the read path with column 
>>>>>>>> stitching in
>>>>>>>> similar scenarios when using local SSDs. I expect that in real 
>>>>>>>> deployments
>>>>>>>> the metadata read cost will mostly be driven by blob I/O (assuming no
>>>>>>>> caching). If blob access becomes the dominant factor in read latency,
>>>>>>>> multithreaded fetching should be able to absorb the overhead 
>>>>>>>> introduced by
>>>>>>>> column stitching, resulting in latency similar to the single‑file 
>>>>>>>> layout
>>>>>>>> (unless IO is already the bottleneck)
>>>>>>>>
>>>>>>>> We should definitely rerun the benchmarks once we have a clearer
>>>>>>>> understanding of the intended usage patterns.
>>>>>>>> Thanks,
>>>>>>>> Peter
>>>>>>>>
>>>>>>>>
>>>>>>>> The relevant(ish) results are for 100 columns, with 2 families with
>>>>>>>> 50-50 columns and local read:
>>>>>>>>
>>>>>>>> The base is:
>>>>>>>> MultiThreadedParquetBenchmark.read        100           0
>>>>>>>>  false    ss   20   3.739 ±  0.096   s/op
>>>>>>>>
>>>>>>>> The read for single threaded:
>>>>>>>> MultiThreadedParquetBenchmark.read        100           2
>>>>>>>>  false    ss   20   4.036 ±  0.082   s/op
>>>>>>>>
>>>>>>>> The read for multi threaded:
>>>>>>>> MultiThreadedParquetBenchmark.read        100           2
>>>>>>>>   true    ss   20   4.063 ±  0.080   s/op
>>>>>>>>
>>>>>>>> Steven Wu <[email protected]> ezt írta (időpont: 2026. febr.
>>>>>>>> 3., K, 23:27):
>>>>>>>>
>>>>>>>>>
>>>>>>>>> I agree with Anton in this
>>>>>>>>> <https://docs.google.com/document/d/1jZy4g6UDi3hdblpkSzDnqgzgATFKFoMaHmt4nNH8M7o/edit?disco=AAAByzDx21w>
>>>>>>>>> comment thread that we probably need to run benchmarks for a few 
>>>>>>>>> common
>>>>>>>>> scenarios to guide this decision. We need to write down detailed 
>>>>>>>>> plans for
>>>>>>>>> those scenarios and what are we measuring. Also ideally, we want to 
>>>>>>>>> measure
>>>>>>>>> using the V4 metadata structure (like Parquet manifest file, column 
>>>>>>>>> stats
>>>>>>>>> structs, adaptive tree). There are PoC PRs available for column stats,
>>>>>>>>> Parquet manifest, and root manifest. It would probably be tricky to 
>>>>>>>>> piece
>>>>>>>>> them together to run the benchmark considering the PoC status. We 
>>>>>>>>> also need
>>>>>>>>> the column stitching capability on the read path to test the column 
>>>>>>>>> file
>>>>>>>>> approach.
>>>>>>>>>
>>>>>>>>> On Tue, Feb 3, 2026 at 1:53 PM Anoop Johnson <[email protected]>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> I'm in favor of co-located DV metadata with column file override
>>>>>>>>>> and not doing affiliated/unaffiliated delete manifests. This is
>>>>>>>>>> conceptually similar to strictly affiliated delete manifests with
>>>>>>>>>> positional joins, and will halve the number of I/Os when there is no 
>>>>>>>>>> DV
>>>>>>>>>> column override. It is simpler to implement
>>>>>>>>>> and will speed up reads.
>>>>>>>>>>
>>>>>>>>>> Unaffiliated DV manifests are flexible for writers. They reduce
>>>>>>>>>> the chance of physical conflicts when there are concurrent 
>>>>>>>>>> large/random
>>>>>>>>>> deletes that change DVs on different files in the same manifest. But 
>>>>>>>>>> the
>>>>>>>>>> flexibility comes at a read-time cost. If the number of unaffiliated 
>>>>>>>>>> DVs
>>>>>>>>>> exceeds a threshold, it could cause driver OOMs or require 
>>>>>>>>>> distributed join
>>>>>>>>>> to pair up DVs with data files. With colocated metadata, manifest 
>>>>>>>>>> DVs can
>>>>>>>>>> reduce the chance of conflicts up to a certain write size.
>>>>>>>>>>
>>>>>>>>>> I assume we will still support unaffiliated manifests for
>>>>>>>>>> equality deletes, but perhaps we can restrict it to just equality 
>>>>>>>>>> deletes.
>>>>>>>>>>
>>>>>>>>>> -Anoop
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Mon, Feb 2, 2026 at 4:27 PM Anton Okolnychyi <
>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>
>>>>>>>>>>> I added the approach with column files to the doc.
>>>>>>>>>>>
>>>>>>>>>>> To sum up, separate data and delete manifests with affinity
>>>>>>>>>>> would perform somewhat on par with co-located DV metadata (a.k.a. 
>>>>>>>>>>> direct
>>>>>>>>>>> assignment) if we add support for column files when we need to 
>>>>>>>>>>> replace most
>>>>>>>>>>> or all DVs (use case 1). That said, the support for direct 
>>>>>>>>>>> assignment with
>>>>>>>>>>> in-line metadata DVs can help us avoid unaffiliated delete 
>>>>>>>>>>> manifests when
>>>>>>>>>>> we need to replace a few DVs (use case 2).
>>>>>>>>>>>
>>>>>>>>>>> So the key question is whether we want to allow
>>>>>>>>>>> unaffiliated delete manifests with DVs... If we don't, then we 
>>>>>>>>>>> would likely
>>>>>>>>>>> want to have co-located DV metadata and must support efficient 
>>>>>>>>>>> column
>>>>>>>>>>> updates not to regress compared to V2 and V3 for large MERGE jobs 
>>>>>>>>>>> that
>>>>>>>>>>> modify a small set of records for most files.
>>>>>>>>>>>
>>>>>>>>>>> пн, 2 лют. 2026 р. о 13:20 Anton Okolnychyi <
>>>>>>>>>>> [email protected]> пише:
>>>>>>>>>>>
>>>>>>>>>>>> Anoop, correct, if we keep data and delete manifests separate,
>>>>>>>>>>>> there is a better way to combine the entries and we should NOT 
>>>>>>>>>>>> rely on the
>>>>>>>>>>>> referenced data file path. Reconciling by implicit position will 
>>>>>>>>>>>> reduce the
>>>>>>>>>>>> size of the DV entry (no need to store the referenced data file 
>>>>>>>>>>>> path) and
>>>>>>>>>>>> will improve the planning performance (no equals/hashCode on the 
>>>>>>>>>>>> path).
>>>>>>>>>>>>
>>>>>>>>>>>> Steven, I agree. Most notes in the doc pre-date discussions we
>>>>>>>>>>>> had on column updates. You are right, given that we are 
>>>>>>>>>>>> gravitating towards
>>>>>>>>>>>> a native way to handle column updates, it seems logical to use the 
>>>>>>>>>>>> same
>>>>>>>>>>>> approach for replacing DVs, since they’re essentially column 
>>>>>>>>>>>> updates. Let
>>>>>>>>>>>> me add one more approach to the doc based on what Anurag and Peter 
>>>>>>>>>>>> have so
>>>>>>>>>>>> far.
>>>>>>>>>>>>
>>>>>>>>>>>> нд, 1 лют. 2026 р. о 20:59 Steven Wu <[email protected]>
>>>>>>>>>>>> пише:
>>>>>>>>>>>>
>>>>>>>>>>>>> Anton, thanks for raising this. I agree this deserves another
>>>>>>>>>>>>> look. I added a comment in your doc that we can potentially apply 
>>>>>>>>>>>>> the
>>>>>>>>>>>>> column update proposal for data file update to the manifest file 
>>>>>>>>>>>>> updates as
>>>>>>>>>>>>> well, to colocate the data DV and data manifest files. Data DVs 
>>>>>>>>>>>>> can be a
>>>>>>>>>>>>> separate column in the data manifest file and updated separately 
>>>>>>>>>>>>> in a
>>>>>>>>>>>>> column file. This is the same as the coalesced positional join 
>>>>>>>>>>>>> that Anoop
>>>>>>>>>>>>> mentioned.
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Sun, Feb 1, 2026 at 4:14 PM Anoop Johnson <[email protected]>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thank you for raising this, Anton. I had a similar
>>>>>>>>>>>>>> observation while prototyping
>>>>>>>>>>>>>> <https://github.com/apache/iceberg/pull/14533> the
>>>>>>>>>>>>>> adaptive metadata tree. The overhead of doing a path-based hash 
>>>>>>>>>>>>>> join of a
>>>>>>>>>>>>>> data manifest with the affiliated delete manifest is high: my 
>>>>>>>>>>>>>> estimate was
>>>>>>>>>>>>>> that the join adds about 5-10% overhead. The hash table 
>>>>>>>>>>>>>> build/probe alone
>>>>>>>>>>>>>> takes about 5 ms for manifests with 25K entries. There are 
>>>>>>>>>>>>>> engines that can
>>>>>>>>>>>>>> do vectorized hash joins that can lower this, but the overhead 
>>>>>>>>>>>>>> and
>>>>>>>>>>>>>> complexity of a SIMD-friendly hash join is non-trivial.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> An alternative to relying on the external file feature in
>>>>>>>>>>>>>> Parquet, is to make affiliated manifests order-preserving: ie 
>>>>>>>>>>>>>> DVs in an
>>>>>>>>>>>>>> affiliated delete manifest must appear in the same position as 
>>>>>>>>>>>>>> the
>>>>>>>>>>>>>> corresponding data file in the data manifest the delete manifest 
>>>>>>>>>>>>>> is
>>>>>>>>>>>>>> affiliated to.  If a data file does not have a DV, the DV 
>>>>>>>>>>>>>> manifest must
>>>>>>>>>>>>>> store a NULL. This would allow us to do positional joins, which 
>>>>>>>>>>>>>> are much
>>>>>>>>>>>>>> faster. If we wanted, we could even have multiple affiliated DV 
>>>>>>>>>>>>>> manifests
>>>>>>>>>>>>>> for a data manifest and the reader would do a COALESCED 
>>>>>>>>>>>>>> positional join
>>>>>>>>>>>>>> (i.e. pick the first non-null value as the DV). It puts the 
>>>>>>>>>>>>>> sorting
>>>>>>>>>>>>>> responsibility to the writers, but it might be a reasonable 
>>>>>>>>>>>>>> tradeoff.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Also, the options don't necessarily have to be mutually
>>>>>>>>>>>>>> exclusive. We could still allow affiliated DVs to be "folded" 
>>>>>>>>>>>>>> into data
>>>>>>>>>>>>>> manifest (e.g. by background optimization jobs or the writer 
>>>>>>>>>>>>>> itself). That
>>>>>>>>>>>>>> might be the optimal choice for read-heavy tables because it 
>>>>>>>>>>>>>> will halve the
>>>>>>>>>>>>>> number of I/Os readers have to make.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>> Anoop
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Fri, Jan 30, 2026 at 6:03 PM Anton Okolnychyi <
>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I had a chance to catch up on some of the V4 discussions.
>>>>>>>>>>>>>>> Given that we are getting rid of the manifest list and 
>>>>>>>>>>>>>>> switching to
>>>>>>>>>>>>>>> Parquet, I wanted to re-evaluate the possibility of direct DV 
>>>>>>>>>>>>>>> assignment
>>>>>>>>>>>>>>> that we discarded in V3 to avoid regressions. I have put 
>>>>>>>>>>>>>>> together my
>>>>>>>>>>>>>>> thoughts in a doc [1].
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> TL;DR:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> - I think the current V4 proposal that keeps data and delete
>>>>>>>>>>>>>>> manifests separate but introduces affinity is a solid choice 
>>>>>>>>>>>>>>> for cases when
>>>>>>>>>>>>>>> we need to replace DVs in many / most files. I outlined an 
>>>>>>>>>>>>>>> approach with
>>>>>>>>>>>>>>> column-split Parquet files but it doesn't improve the 
>>>>>>>>>>>>>>> performance and takes
>>>>>>>>>>>>>>> dependency on a portion of the Parquet spec that is not really 
>>>>>>>>>>>>>>> implemented.
>>>>>>>>>>>>>>> - Pushing unaffiliated DVs directly into the root to replace
>>>>>>>>>>>>>>> a small set of DVs is going to be fast on write but does 
>>>>>>>>>>>>>>> require resolving
>>>>>>>>>>>>>>> where those DVs apply at read time. Using inline metadata DVs 
>>>>>>>>>>>>>>> with
>>>>>>>>>>>>>>> column-split Parquet files is a little more promising in this 
>>>>>>>>>>>>>>> case as it
>>>>>>>>>>>>>>> allows to avoid unaffiliated DVs. That said, it again relies on 
>>>>>>>>>>>>>>> something
>>>>>>>>>>>>>>> Parquet doesn't implement right now, requires changing 
>>>>>>>>>>>>>>> maintenance
>>>>>>>>>>>>>>> operations, and yields minimal benefits.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> All in all, the V4 proposal seems like a strict improvement
>>>>>>>>>>>>>>> over V3 but I insist that we reconsider usage of the referenced 
>>>>>>>>>>>>>>> data file
>>>>>>>>>>>>>>> path when resolving DVs to data files.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> [1] -
>>>>>>>>>>>>>>> https://docs.google.com/document/d/1jZy4g6UDi3hdblpkSzDnqgzgATFKFoMaHmt4nNH8M7o
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> - Anton
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> сб, 22 лист. 2025 р. о 13:37 Amogh Jahagirdar <
>>>>>>>>>>>>>>> [email protected]> пише:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Hey all,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Here is the meeting recording
>>>>>>>>>>>>>>>> <https://drive.google.com/file/d/1lG9sM-JTwqcIgk7JsAryXXCc1vMnstJs/view?usp=sharing>
>>>>>>>>>>>>>>>>  and generated meeting summary
>>>>>>>>>>>>>>>> <https://docs.google.com/document/d/1e50p8TXL2e3CnUwKMOvm8F4s2PeVMiKWHPxhxOW1fIM/edit?usp=sharing>.
>>>>>>>>>>>>>>>> Thanks all for attending yesterday!
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Thu, Nov 20, 2025 at 8:49 AM Amogh Jahagirdar <
>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Hey folks,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I was out for some time, but set up a sync for tomorrow at
>>>>>>>>>>>>>>>>> 9am PST. For this discussion, I do think it would be great to 
>>>>>>>>>>>>>>>>> focus on the
>>>>>>>>>>>>>>>>> manifest DV representation, factoring in analyses on bitmap 
>>>>>>>>>>>>>>>>> representation
>>>>>>>>>>>>>>>>> storage footprints, and the entry structure considering how 
>>>>>>>>>>>>>>>>> we want to
>>>>>>>>>>>>>>>>> approach change detection. If there are other topics that 
>>>>>>>>>>>>>>>>> people want to
>>>>>>>>>>>>>>>>> highlight, please do bring those up as well!
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I also recognize that this is a bit short term scheduling,
>>>>>>>>>>>>>>>>> so please do reach out to me if this time is difficult to 
>>>>>>>>>>>>>>>>> work with; next
>>>>>>>>>>>>>>>>> week is the Thanksgiving holidays here, and since people 
>>>>>>>>>>>>>>>>> would be
>>>>>>>>>>>>>>>>> travelling/out I figured I'd try to schedule before then.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>> Amogh Jahagirdar
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Fri, Oct 17, 2025 at 9:03 AM Amogh Jahagirdar <
>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Hey folks,
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Sorry for the delay, here's the recording link
>>>>>>>>>>>>>>>>>> <https://drive.google.com/file/d/1YOmPROXjAKYAWAcYxqAFHdADbqELVVf2/view>
>>>>>>>>>>>>>>>>>>   from
>>>>>>>>>>>>>>>>>> last week's discussion.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>> Amogh Jahagirdar
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On Fri, Oct 10, 2025 at 9:44 AM Péter Váry <
>>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Same here.
>>>>>>>>>>>>>>>>>>> Please record if you can.
>>>>>>>>>>>>>>>>>>> Thanks, Peter
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> On Fri, Oct 10, 2025, 17:39 Fokko Driesprong <
>>>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Hey Amogh,
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Thanks for the write-up. Unfortunately, I won’t be able
>>>>>>>>>>>>>>>>>>>> to attend. Will it be recorded? Thanks!
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Kind regards,
>>>>>>>>>>>>>>>>>>>> Fokko
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Op di 7 okt 2025 om 20:36 schreef Amogh Jahagirdar <
>>>>>>>>>>>>>>>>>>>> [email protected]>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Hey all,
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> I've setup time this Friday at 9am PST for another
>>>>>>>>>>>>>>>>>>>>> sync on single file commits. In terms of what would be 
>>>>>>>>>>>>>>>>>>>>> great to focus on
>>>>>>>>>>>>>>>>>>>>> for the discussion:
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> 1. Whether it makes sense or not to eliminate the
>>>>>>>>>>>>>>>>>>>>> tuple, and instead representing the tuple via lower/upper 
>>>>>>>>>>>>>>>>>>>>> boundaries. As a
>>>>>>>>>>>>>>>>>>>>> reminder, one of the goals is to avoid tying a partition 
>>>>>>>>>>>>>>>>>>>>> spec to a
>>>>>>>>>>>>>>>>>>>>> manifest; in the root we can have a mix of files spanning 
>>>>>>>>>>>>>>>>>>>>> different
>>>>>>>>>>>>>>>>>>>>> partition specs, and even in leaf manifests avoiding this 
>>>>>>>>>>>>>>>>>>>>> coupling can
>>>>>>>>>>>>>>>>>>>>> enable more desirable clustering of metadata.
>>>>>>>>>>>>>>>>>>>>> In the vast majority of cases, we could leverage the
>>>>>>>>>>>>>>>>>>>>> property that a file is effectively partitioned if the 
>>>>>>>>>>>>>>>>>>>>> lower/upper for a
>>>>>>>>>>>>>>>>>>>>> given field is equal. The nuance here is with the 
>>>>>>>>>>>>>>>>>>>>> particular case of
>>>>>>>>>>>>>>>>>>>>> identity partitioned string/binary columns which can be 
>>>>>>>>>>>>>>>>>>>>> truncated in stats.
>>>>>>>>>>>>>>>>>>>>> One approach is to require that writers must not produce 
>>>>>>>>>>>>>>>>>>>>> truncated stats
>>>>>>>>>>>>>>>>>>>>> for identity partitioned columns. It's also important to 
>>>>>>>>>>>>>>>>>>>>> keep in mind that
>>>>>>>>>>>>>>>>>>>>> all of this is just for the purpose of reconstructing the 
>>>>>>>>>>>>>>>>>>>>> partition tuple,
>>>>>>>>>>>>>>>>>>>>> which is only required during equality delete matching. 
>>>>>>>>>>>>>>>>>>>>> Another area we
>>>>>>>>>>>>>>>>>>>>> need to cover as part of this is on exact bounds on 
>>>>>>>>>>>>>>>>>>>>> stats. There are other
>>>>>>>>>>>>>>>>>>>>> options here as well such as making all new equality 
>>>>>>>>>>>>>>>>>>>>> deletes in V4 be
>>>>>>>>>>>>>>>>>>>>> global and instead match based on bounds, or keeping the 
>>>>>>>>>>>>>>>>>>>>> tuple but each
>>>>>>>>>>>>>>>>>>>>> tuple is effectively based off a union schema of all 
>>>>>>>>>>>>>>>>>>>>> partition specs. I am
>>>>>>>>>>>>>>>>>>>>> adding a separate appendix section outlining the span of 
>>>>>>>>>>>>>>>>>>>>> options here and
>>>>>>>>>>>>>>>>>>>>> the different tradeoffs.
>>>>>>>>>>>>>>>>>>>>> Once we get this more to a conclusive state, I'll move
>>>>>>>>>>>>>>>>>>>>> a summarized version to the main doc.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> 2. @[email protected] <[email protected]> has
>>>>>>>>>>>>>>>>>>>>> updated the doc with a section
>>>>>>>>>>>>>>>>>>>>> <https://docs.google.com/document/d/1k4x8utgh41Sn1tr98eynDKCWq035SV_f75rtNHcerVw/edit?tab=t.rrpksmp8zkb#heading=h.qau0y5xkh9mn>
>>>>>>>>>>>>>>>>>>>>>  on
>>>>>>>>>>>>>>>>>>>>> how we can do change detection from the root in a variety 
>>>>>>>>>>>>>>>>>>>>> of write
>>>>>>>>>>>>>>>>>>>>> scenarios. I've done a review on it, and it covers the 
>>>>>>>>>>>>>>>>>>>>> cases I would
>>>>>>>>>>>>>>>>>>>>> expect. It'd be good for folks to take a look and please 
>>>>>>>>>>>>>>>>>>>>> give feedback
>>>>>>>>>>>>>>>>>>>>> before we discuss. Thank you Steven for adding that 
>>>>>>>>>>>>>>>>>>>>> section and all the
>>>>>>>>>>>>>>>>>>>>> diagrams.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>> Amogh Jahagirdar
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> On Thu, Sep 18, 2025 at 3:19 PM Amogh Jahagirdar <
>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Hey folks just following up from the discussion last
>>>>>>>>>>>>>>>>>>>>>> Friday with a summary and some next steps:
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> 1.) For the various change detection cases, we
>>>>>>>>>>>>>>>>>>>>>> concluded it's best just to go through those in an 
>>>>>>>>>>>>>>>>>>>>>> offline manner on the
>>>>>>>>>>>>>>>>>>>>>> doc since it's hard to verify all that correctness in a 
>>>>>>>>>>>>>>>>>>>>>> large meeting
>>>>>>>>>>>>>>>>>>>>>> setting.
>>>>>>>>>>>>>>>>>>>>>> 2.) We mostly discussed eliminating the
>>>>>>>>>>>>>>>>>>>>>> partition tuple. On the original proposal, I was mostly 
>>>>>>>>>>>>>>>>>>>>>> aiming for the
>>>>>>>>>>>>>>>>>>>>>> ability to re-constructing the tuple from the stats for 
>>>>>>>>>>>>>>>>>>>>>> the purpose of
>>>>>>>>>>>>>>>>>>>>>> equality delete matching (a file is partitioned if the 
>>>>>>>>>>>>>>>>>>>>>> lower and upper
>>>>>>>>>>>>>>>>>>>>>> bounds are equal); There's some nuance in how we need to 
>>>>>>>>>>>>>>>>>>>>>> handle identity
>>>>>>>>>>>>>>>>>>>>>> partition values since for string/binary they cannot be 
>>>>>>>>>>>>>>>>>>>>>> truncated.
>>>>>>>>>>>>>>>>>>>>>> Another potential option is to treat all equality 
>>>>>>>>>>>>>>>>>>>>>> deletes as effectively
>>>>>>>>>>>>>>>>>>>>>> global and narrow their application based on the stats 
>>>>>>>>>>>>>>>>>>>>>> values. This may
>>>>>>>>>>>>>>>>>>>>>> require defining tight bounds. I'm still collecting my 
>>>>>>>>>>>>>>>>>>>>>> thoughts on this one.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Thanks folks! Please also let me know if any of the
>>>>>>>>>>>>>>>>>>>>>> following links are inaccessible for any reason.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Meeting recording link:
>>>>>>>>>>>>>>>>>>>>>> https://drive.google.com/file/d/1gv8TrR5xzqqNxek7_sTZkpbwQx1M3dhK/view
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Meeting summary:
>>>>>>>>>>>>>>>>>>>>>> https://docs.google.com/document/d/131N0CDpzZczURxitN0HGS7dTqRxQT_YS9jMECkGGvQU
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> On Mon, Sep 8, 2025 at 3:40 PM Amogh Jahagirdar <
>>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Update: I moved the discussion time to this Friday
>>>>>>>>>>>>>>>>>>>>>>> at 9 am PST since I found out that quite a few folks 
>>>>>>>>>>>>>>>>>>>>>>> involved in the
>>>>>>>>>>>>>>>>>>>>>>> proposals will be out next week, and I also know some 
>>>>>>>>>>>>>>>>>>>>>>> folks will also be
>>>>>>>>>>>>>>>>>>>>>>> out the week after that.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>>> Amogh J
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> On Mon, Sep 8, 2025 at 8:57 AM Amogh Jahagirdar <
>>>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> Hey folks sorry for the late follow up here,
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> Thanks @Kevin Liu <[email protected]> for
>>>>>>>>>>>>>>>>>>>>>>>> sharing the recording link of the previous discussion! 
>>>>>>>>>>>>>>>>>>>>>>>> I've set up another
>>>>>>>>>>>>>>>>>>>>>>>> sync for next Tuesday 09/16 at 9am PST. This time I've 
>>>>>>>>>>>>>>>>>>>>>>>> set it up from my
>>>>>>>>>>>>>>>>>>>>>>>> corporate email so we can get recordings and 
>>>>>>>>>>>>>>>>>>>>>>>> transcriptions (and I've made
>>>>>>>>>>>>>>>>>>>>>>>> sure to keep the meeting invite open so we don't have 
>>>>>>>>>>>>>>>>>>>>>>>> to manually let
>>>>>>>>>>>>>>>>>>>>>>>> people in).
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> In terms of next steps of areas which I think would
>>>>>>>>>>>>>>>>>>>>>>>> be good to focus on for establishing consensus:
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> 1. How do we model the manifest entry structure
>>>>>>>>>>>>>>>>>>>>>>>> so that changes to manifest DVs can be obtained easily 
>>>>>>>>>>>>>>>>>>>>>>>> from the root? There
>>>>>>>>>>>>>>>>>>>>>>>> are a few options here; the most promising approach is 
>>>>>>>>>>>>>>>>>>>>>>>> to keep an
>>>>>>>>>>>>>>>>>>>>>>>> additional DV which encodes the diff in additional 
>>>>>>>>>>>>>>>>>>>>>>>> positions which have
>>>>>>>>>>>>>>>>>>>>>>>> been removed from a leaf manifest.
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> 2. Modeling partition transforms via expressions
>>>>>>>>>>>>>>>>>>>>>>>> and establishing a unified table ID space so that we 
>>>>>>>>>>>>>>>>>>>>>>>> can simplify how
>>>>>>>>>>>>>>>>>>>>>>>> partition tuples may be represented via stats and also 
>>>>>>>>>>>>>>>>>>>>>>>> have a way in the
>>>>>>>>>>>>>>>>>>>>>>>> future to store stats on any derived column. I have a 
>>>>>>>>>>>>>>>>>>>>>>>> short
>>>>>>>>>>>>>>>>>>>>>>>> proposal
>>>>>>>>>>>>>>>>>>>>>>>> <https://docs.google.com/document/d/1oV8dapKVzB4pZy5pKHUCj5j9i2_1p37BJSeT7hyKPpg/edit?tab=t.0>
>>>>>>>>>>>>>>>>>>>>>>>>  for
>>>>>>>>>>>>>>>>>>>>>>>> this that probably still needs some tightening up on 
>>>>>>>>>>>>>>>>>>>>>>>> the expression
>>>>>>>>>>>>>>>>>>>>>>>> modeling itself (and some prototyping) but the general 
>>>>>>>>>>>>>>>>>>>>>>>> idea for
>>>>>>>>>>>>>>>>>>>>>>>> establishing a unified table ID space is covered. All 
>>>>>>>>>>>>>>>>>>>>>>>> feedback welcome!
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> Amogh Jahagirdar
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> On Mon, Aug 25, 2025 at 1:34 PM Kevin Liu <
>>>>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> Thanks Amogh. Looks like the recording for last
>>>>>>>>>>>>>>>>>>>>>>>>> week's sync is available on Youtube. Here's the link,
>>>>>>>>>>>>>>>>>>>>>>>>> https://www.youtube.com/watch?v=uWm-p--8oVQ
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>>>>>>>>>>>> Kevin Liu
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> On Tue, Aug 12, 2025 at 9:10 PM Amogh Jahagirdar <
>>>>>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> Hey folks,
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> Just following up on this to give the community
>>>>>>>>>>>>>>>>>>>>>>>>>> as to where we're at and my proposed next steps.
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> I've been editing and merging the contents from
>>>>>>>>>>>>>>>>>>>>>>>>>> our proposal into the proposal
>>>>>>>>>>>>>>>>>>>>>>>>>> <https://docs.google.com/document/d/1k4x8utgh41Sn1tr98eynDKCWq035SV_f75rtNHcerVw/edit?tab=t.0#heading=h.unn922df0zzw>
>>>>>>>>>>>>>>>>>>>>>>>>>>  from
>>>>>>>>>>>>>>>>>>>>>>>>>> Russell and others. For any future comments on docs, 
>>>>>>>>>>>>>>>>>>>>>>>>>> please comment on the
>>>>>>>>>>>>>>>>>>>>>>>>>> linked proposal. I've also marked it on our doc in 
>>>>>>>>>>>>>>>>>>>>>>>>>> red text so it's clear
>>>>>>>>>>>>>>>>>>>>>>>>>> to redirect to the other proposal as a source of 
>>>>>>>>>>>>>>>>>>>>>>>>>> truth for comments.
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> In terms of next steps,
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> 1. An important design decision point is around
>>>>>>>>>>>>>>>>>>>>>>>>>> inline manifest DVs, external manifest DVs or 
>>>>>>>>>>>>>>>>>>>>>>>>>> enabling both. I'm working on
>>>>>>>>>>>>>>>>>>>>>>>>>> measuring different approaches for representing the 
>>>>>>>>>>>>>>>>>>>>>>>>>> compressed DV
>>>>>>>>>>>>>>>>>>>>>>>>>> representation since that will inform how many 
>>>>>>>>>>>>>>>>>>>>>>>>>> entries can reasonably fit
>>>>>>>>>>>>>>>>>>>>>>>>>> in a small root manifest; from that we can derive 
>>>>>>>>>>>>>>>>>>>>>>>>>> implications on different
>>>>>>>>>>>>>>>>>>>>>>>>>> write patterns and determine the right approach for 
>>>>>>>>>>>>>>>>>>>>>>>>>> storing these manifest
>>>>>>>>>>>>>>>>>>>>>>>>>> DVs.
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> 2. Another key point is around determining if/how
>>>>>>>>>>>>>>>>>>>>>>>>>> we can reasonably enable V4 to represent changes in 
>>>>>>>>>>>>>>>>>>>>>>>>>> the root manifest so
>>>>>>>>>>>>>>>>>>>>>>>>>> that readers can effectively just infer file level 
>>>>>>>>>>>>>>>>>>>>>>>>>> changes from the root.
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> 3. One of the aspects of the proposal is getting
>>>>>>>>>>>>>>>>>>>>>>>>>> away from partition tuple requirement in the root 
>>>>>>>>>>>>>>>>>>>>>>>>>> which currently holds us
>>>>>>>>>>>>>>>>>>>>>>>>>> to have associativity between a partition spec and a 
>>>>>>>>>>>>>>>>>>>>>>>>>> manifest. These
>>>>>>>>>>>>>>>>>>>>>>>>>> aspects can be modeled as essentially column stats 
>>>>>>>>>>>>>>>>>>>>>>>>>> which gives a lot of
>>>>>>>>>>>>>>>>>>>>>>>>>> flexibility into the organization of the manifest. 
>>>>>>>>>>>>>>>>>>>>>>>>>> There are important
>>>>>>>>>>>>>>>>>>>>>>>>>> details around field ID spaces here which tie into 
>>>>>>>>>>>>>>>>>>>>>>>>>> how the stats are
>>>>>>>>>>>>>>>>>>>>>>>>>> structured. What we're proposing here is to have a 
>>>>>>>>>>>>>>>>>>>>>>>>>> unified expression ID
>>>>>>>>>>>>>>>>>>>>>>>>>> space that could also benefit us for storing things 
>>>>>>>>>>>>>>>>>>>>>>>>>> like virtual columns
>>>>>>>>>>>>>>>>>>>>>>>>>> down the line. I go into this in the proposal but 
>>>>>>>>>>>>>>>>>>>>>>>>>> I'm working on separating
>>>>>>>>>>>>>>>>>>>>>>>>>> the appropriate parts so that the original proposal 
>>>>>>>>>>>>>>>>>>>>>>>>>> can mostly just focus
>>>>>>>>>>>>>>>>>>>>>>>>>> on the organization of the content metadata tree and 
>>>>>>>>>>>>>>>>>>>>>>>>>> not how we want to
>>>>>>>>>>>>>>>>>>>>>>>>>> solve this particular ID space problem.
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> 4. I'm planning on scheduling a recurring
>>>>>>>>>>>>>>>>>>>>>>>>>> community sync starting next Tuesday at 9am PST, 
>>>>>>>>>>>>>>>>>>>>>>>>>> every 2 weeks. If I get
>>>>>>>>>>>>>>>>>>>>>>>>>> feedback from folks that this time will never work, 
>>>>>>>>>>>>>>>>>>>>>>>>>> I can certainly adjust.
>>>>>>>>>>>>>>>>>>>>>>>>>> For some reason, I don't have the ability to add to 
>>>>>>>>>>>>>>>>>>>>>>>>>> the Iceberg Dev
>>>>>>>>>>>>>>>>>>>>>>>>>> calendar, so I'll figure that out and update the 
>>>>>>>>>>>>>>>>>>>>>>>>>> thread when the event is
>>>>>>>>>>>>>>>>>>>>>>>>>> scheduled.
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> Amogh Jahagirdar
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> On Tue, Jul 22, 2025 at 11:47 AM Russell Spitzer <
>>>>>>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> I think this is a great way forward, starting
>>>>>>>>>>>>>>>>>>>>>>>>>>> out with this much parallel development shows that 
>>>>>>>>>>>>>>>>>>>>>>>>>>> we have a lot of
>>>>>>>>>>>>>>>>>>>>>>>>>>> consensus already :)
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> On Tue, Jul 22, 2025 at 12:42 PM Amogh
>>>>>>>>>>>>>>>>>>>>>>>>>>> Jahagirdar <[email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hey folks, just following up on this. It looks
>>>>>>>>>>>>>>>>>>>>>>>>>>>> like our proposal and the proposal that @Russell
>>>>>>>>>>>>>>>>>>>>>>>>>>>> Spitzer <[email protected]> shared are
>>>>>>>>>>>>>>>>>>>>>>>>>>>> pretty aligned. I was just chatting with Russell 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> about this, and we think
>>>>>>>>>>>>>>>>>>>>>>>>>>>> it'd be best to combine both proposals and have a 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> singular large effort on
>>>>>>>>>>>>>>>>>>>>>>>>>>>> this. I can also set up a focused community 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> discussion (similar to what
>>>>>>>>>>>>>>>>>>>>>>>>>>>> we're doing on the other V4 proposals) on this 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> starting sometime next week
>>>>>>>>>>>>>>>>>>>>>>>>>>>> just to get things moving, if that works for 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> people.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> Amogh Jahagirdar
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Mon, Jul 14, 2025 at 9:48 PM Amogh
>>>>>>>>>>>>>>>>>>>>>>>>>>>> Jahagirdar <[email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hey Russell,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks for sharing the proposal! A few of us
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (Ryan, Dan, Anoop and I) have also been working 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> on a proposal for an
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> adaptive metadata tree structure as part of 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> enabling more efficient one
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> file commits. From a read of the summary, it's 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> great to see that we're
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> thinking along the same lines about how to tackle 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> this fundamental area!
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Here is our proposal:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> https://docs.google.com/document/d/1q2asTpq471pltOTC6AsTLQIQcgEsh0AvEhRWnCcvZn0
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> <https://docs.google.com/document/d/1q2asTpq471pltOTC6AsTLQIQcgEsh0AvEhRWnCcvZn0>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Amogh Jahagirdar
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Mon, Jul 14, 2025 at 8:08 PM Russell
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Spitzer <[email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hey y'all!
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> We (Yi Fang, Steven Wu and Myself) wanted to
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> share some
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> of the thoughts we had on how one-file
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> commits could work in Iceberg. This is pretty
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> much just a high level overview of the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> concepts we think we need and how Iceberg would 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> behave.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> We haven't gone very far into the actual
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> implementation and changes that would need to 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> occur in the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> SDK to make this happen.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> The high level summary is:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Manifest Lists are out
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Root Manifests take their place
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>   A Root manifest can have data manifests,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> delete manifests, manifest delete vectors, data 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> delete vectors and data
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> files
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>   Manifest delete vectors allow for modifying
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> a manifest without deleting it entirely
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>   Data files let you append without writing
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> an intermediary manifest
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>   Having child data and delete manifests lets
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> you still scale
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Please take a look if you like,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> https://docs.google.com/document/d/1k4x8utgh41Sn1tr98eynDKCWq035SV_f75rtNHcerVw/edit?tab=t.0
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I'm excited to see what other proposals and
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Ideas are floating around the community,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Russ
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, Jul 2, 2025 at 6:29 PM John Zhuge <
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Very excited about the idea!
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, Jul 2, 2025 at 1:17 PM Anoop Johnson
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> <[email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I'm very interested in this initiative.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Micah Kornfield and I presented
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> <https://youtu.be/4d4nqKkANdM?si=9TXgaUIXbq-l8idi&t=1405>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> on high-throughput ingestion for Iceberg 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> tables at the 2024 Iceberg Summit,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> which leveraged Google infrastructure like 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Colossus for efficient appends.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> This new proposal is particularly exciting
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> because it offers significant advancements in 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> commit latency and metadata
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> storage footprint. Furthermore, a consistent 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> manifest structure promises to
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> simplify the design and codebase, which is a 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> major benefit.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> A related idea I've been exploring is
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> having a loose affinity between data and 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> delete manifests. While the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> current separation of data and delete 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> manifests in Iceberg is valuable for
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> avoiding data file rewrites (and stats 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> updates) when deletes change, it
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> does necessitate a join operation during 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> reads. I'd be keen to discuss
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> approaches that could potentially reduce this 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> read-side cost while
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> retaining the benefits of separate manifests.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Anoop
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Fri, Jun 13, 2025 at 11:06 AM Jagdeep
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Sidhu <[email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi everyone,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I am new to the Iceberg community but
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> would love to participate in these 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> discussions to reduce the number of file
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> writes, especially for small writes/commits.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thank you!
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> -Jagdeep
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, Jun 5, 2025 at 4:02 PM Anurag
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Mantripragada 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> <[email protected]>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> We have been hitting all the metadata
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> problems you mentioned, Ryan. I’m on-board 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to help however I can to improve
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> this area.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ~ Anurag Mantripragada
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Jun 3, 2025, at 2:22 AM, Huang-Hsiang
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Cheng <[email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I am interested in this idea and looking
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> forward to collaboration.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Huang-Hsiang
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Jun 2, 2025, at 10:14 AM, namratha mk <
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hello,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I am interested in contributing to this
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> effort.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Namratha
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, May 29, 2025 at 1:36 PM Amogh
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Jahagirdar <[email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks for kicking this thread off Ryan,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I'm interested in helping out here! I've 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> been working on a proposal in this
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> area and it would be great to collaborate 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> with different folks and exchange
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ideas here, since I think a lot of people 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> are interested in solving this
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> problem.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Amogh Jahagirdar
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, May 29, 2025 at 2:25 PM Ryan
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Blue <[email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi everyone,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Like Russell’s recent note, I’m
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> starting a thread to connect those of us 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> that are interested in the idea of
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> changing Iceberg’s metadata in v4 so that 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> in most cases committing a change
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> only requires writing one additional 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> metadata file.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> *Idea: One-file commits*
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> The current Iceberg metadata structure
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> requires writing at least one manifest and 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> a new manifest list to produce a
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> new snapshot. The goal of this work is to 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> allow more flexibility by
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> allowing the manifest list layer to store 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> data and delete files. As a
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> result, only one file write would be 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> needed before committing the new
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> snapshot. In addition, this work will also 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> try to explore:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>    - Avoiding small manifests that
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>    must be read in parallel and later 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> compacted (metadata maintenance changes)
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>    - Extend metadata skipping to use
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>    aggregated column ranges that are 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> compatible with geospatial data (manifest
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>    metadata)
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>    - Using soft deletes to avoid
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>    rewriting existing manifests (metadata 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> DVs)
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> If you’re interested in these problems,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> please reply!
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Ryan
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> John Zhuge
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>

Re: [DISCUSS] v4 - One file commits

Reply via email to