Amogh, can you upload the video to the YouTube channel? https://www.youtube.com/playlist?list=PLkifVhhWtccxt1TE7w_HbNGhY5gpDTaX7
On Mon, Mar 30, 2026 at 8:28 AM Amogh Jahagirdar <[email protected]> wrote: > Hey a few folks reached out indicating that I didn't properly share the > last v4 metadata tree meeting recording. So sorry about that! Here's the > link > <https://drive.google.com/file/d/1LhDL0Iy8YR4RN_W3D8APOUtkSBYk61fD/view?usp=drive_link> > , > do let me know if there are still issues. > > On Tue, Mar 3, 2026 at 9:17 AM Steven Wu <[email protected]> wrote: > >> My takeaway from the conversation is also that we don't need row-level >> column updates. Manifest DV can be used for row-level updates instead. >> Basically, a file (manifest or data) can be updated via (1) delete vector + >> updated rows in a new file (2) column file overlay. Depends on the >> percentage of modified rows, engines can choose which way to go. >> >> On Tue, Mar 3, 2026 at 6:24 AM Gábor Kaszab <[email protected]> >> wrote: >> >>> Thanks for the summary, Micah! I tried to watch the recording linked to >>> the calendar event, but apparently I don't have permission to do so. Not >>> sure about others. >>> >>> So if 'm not mistaken, one way to reduce the write cost of an UPDATE for >>> colocated DVs is to use the column updates. As I see there was some >>> agreement that row-level partial column updates aren't desired, and we aim >>> for at least file-level column updates. This is very useful information for >>> the other conversation >>> <https://lists.apache.org/thread/w90rqyhmh6pb0yxp0bqzgzk1y1rotyny> >>> going on for the column update proposal. We can bring this up on the column >>> update sync tomorrow, but I'm wondering if the consensus on avoiding >>> row-level column updates is something we can incorporate into the column >>> update proposal too or if it's something still up to debate. >>> >>> Best Regards, >>> Gabor >>> >>> Micah Kornfield <[email protected]> ezt írta (időpont: 2026. febr. >>> 25., Sze, 22:30): >>> >>>> Just wanted to summarize my main takeaways of Monday's sync. >>>> >>>> The approach will always collocate DVs with the data files (i.e. every >>>> data file row in a manifest has an optional DV reference). This implies >>>> that there is not a separate "Deletion manifest". Rather in V4 all >>>> manifests are "combined" where data files and DVs are colocated. >>>> >>>> Write amplification is avoided in two ways: >>>> 1. For small updates we will need to carry through metadata >>>> statistics (and other relevant data file fields) in memory (rescanning >>>> these is likely two expensive). Once updates are available they will be >>>> written out a new manifest (either root or leaf) and use metadata DVs to >>>> remove the old rows. >>>> 2. For larger updates we will only carry through the DV update parts >>>> in memory and use column level updates to replace existing DVs (this would >>>> require rescanning the DV columns for any updated manifest to merge with >>>> the updated DVs in memory, and then writing out the column update). The >>>> consensus on the call is that we didn't want to support partial column >>>> updates (a.k.a. merge-on-read column updates). >>>> >>>> The idea is that engines would decide which path to follow based on the >>>> number of affected files. >>>> >>>> To help understand the implications of the new proposal, I put together >>>> a quick spreadsheet [1] to analyze trade-offs between separate deletion >>>> manifests and the new approach under scenario 1 and 2. This represents the >>>> worst case scenario where file updates are uniformly distributed across a >>>> single update operation. It does not account for repeated writes (e.g. >>>> on-going compaction). My main take-aways is that keeping at most 1 >>>> affiliated DV separate might still help (akin to a merge on read column >>>> update), but maybe not enough relative to other parts of the system (e.g. >>>> the churn on data files) that the complexity. >>>> >>>> Hope this is helpful. >>>> >>>> Micah >>>> >>>> [1] >>>> https://docs.google.com/spreadsheets/d/1klZQxV7ST2C-p9LTMmai_5rtFiyupj6jSLRPRkdI-u8/edit?gid=0#gid=0 >>>> >>>> >>>> >>>> On Thu, Feb 19, 2026 at 3:52 PM Amogh Jahagirdar <[email protected]> >>>> wrote: >>>> >>>>> Hey folks, I've set up an additional initial discussion on DVs for >>>>> Monday. This topic is fairly complex and there is also now a free calendar >>>>> slot. I think it'd be helpful for us to first make sure we're all on the >>>>> same page in terms of what the approach proposed by Anton earlier in the >>>>> thread means and the high level mechanics. I should also have more to >>>>> share >>>>> on the doc about how the entry structure and change detection could look >>>>> like in this approach. Then on Thursday we can get into more details and >>>>> targeted points of discussion on this topic. >>>>> >>>>> Thanks, >>>>> Amogh Jahagirdar >>>>> >>>>> On Tue, Feb 17, 2026 at 9:27 PM Amogh Jahagirdar <[email protected]> >>>>> wrote: >>>>> >>>>>> Thanks Steven! I've set up some time next Thursday for the community >>>>>> to discuss this. We're also looking at how the content entry would look >>>>>> like in a combined DV with potential column updates for DV changes, and >>>>>> how >>>>>> change detection could look like in this approach. I should have more to >>>>>> share on this by the time of the community discussion next week. >>>>>> We should also consider potential root churn and memory consumption >>>>>> stemming from expected root entry inflation due to a combined data file + >>>>>> DV entry with possible column updates for certain DV workloads; though at >>>>>> least for memory consumption of stats being held after planning, that >>>>>> arguably is an implementation problem for certain integrations. >>>>>> >>>>>> Thanks, >>>>>> Amogh Jahagirdar >>>>>> >>>>>> On Fri, Feb 13, 2026 at 10:58 AM Steven Wu <[email protected]> >>>>>> wrote: >>>>>> >>>>>>> I wrote up some analysis with back-of-the-envelope calculations >>>>>>> about the column update approach for DV colocation. It mainly concerns >>>>>>> the >>>>>>> 2nd use case: deleting a large number of rows from a small number of >>>>>>> files. >>>>>>> >>>>>>> >>>>>>> https://docs.google.com/document/d/1k4x8utgh41Sn1tr98eynDKCWq035SV_f75rtNHcerVw/edit?tab=t.gvdulzy486n7 >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Wed, Feb 4, 2026 at 1:02 AM Péter Váry < >>>>>>> [email protected]> wrote: >>>>>>> >>>>>>>> I fully agree with Anton and Steven that we need benchmarks before >>>>>>>> choosing any direction. >>>>>>>> >>>>>>>> I ran some preliminary column‑stitching benchmarks last summer: >>>>>>>> >>>>>>>> - Results are available in the doc: >>>>>>>> >>>>>>>> https://docs.google.com/document/d/1OHuZ6RyzZvCOQ6UQoV84GzwVp3UPiu_cfXClsOi03ww >>>>>>>> - Code is here: https://github.com/apache/iceberg/pull/13306 >>>>>>>> >>>>>>>> I’ve summarized the most relevant results at the end of this email. >>>>>>>> They show roughly a 10% slowdown on the read path with column >>>>>>>> stitching in >>>>>>>> similar scenarios when using local SSDs. I expect that in real >>>>>>>> deployments >>>>>>>> the metadata read cost will mostly be driven by blob I/O (assuming no >>>>>>>> caching). If blob access becomes the dominant factor in read latency, >>>>>>>> multithreaded fetching should be able to absorb the overhead >>>>>>>> introduced by >>>>>>>> column stitching, resulting in latency similar to the single‑file >>>>>>>> layout >>>>>>>> (unless IO is already the bottleneck) >>>>>>>> >>>>>>>> We should definitely rerun the benchmarks once we have a clearer >>>>>>>> understanding of the intended usage patterns. >>>>>>>> Thanks, >>>>>>>> Peter >>>>>>>> >>>>>>>> >>>>>>>> The relevant(ish) results are for 100 columns, with 2 families with >>>>>>>> 50-50 columns and local read: >>>>>>>> >>>>>>>> The base is: >>>>>>>> MultiThreadedParquetBenchmark.read 100 0 >>>>>>>> false ss 20 3.739 ± 0.096 s/op >>>>>>>> >>>>>>>> The read for single threaded: >>>>>>>> MultiThreadedParquetBenchmark.read 100 2 >>>>>>>> false ss 20 4.036 ± 0.082 s/op >>>>>>>> >>>>>>>> The read for multi threaded: >>>>>>>> MultiThreadedParquetBenchmark.read 100 2 >>>>>>>> true ss 20 4.063 ± 0.080 s/op >>>>>>>> >>>>>>>> Steven Wu <[email protected]> ezt írta (időpont: 2026. febr. >>>>>>>> 3., K, 23:27): >>>>>>>> >>>>>>>>> >>>>>>>>> I agree with Anton in this >>>>>>>>> <https://docs.google.com/document/d/1jZy4g6UDi3hdblpkSzDnqgzgATFKFoMaHmt4nNH8M7o/edit?disco=AAAByzDx21w> >>>>>>>>> comment thread that we probably need to run benchmarks for a few >>>>>>>>> common >>>>>>>>> scenarios to guide this decision. We need to write down detailed >>>>>>>>> plans for >>>>>>>>> those scenarios and what are we measuring. Also ideally, we want to >>>>>>>>> measure >>>>>>>>> using the V4 metadata structure (like Parquet manifest file, column >>>>>>>>> stats >>>>>>>>> structs, adaptive tree). There are PoC PRs available for column stats, >>>>>>>>> Parquet manifest, and root manifest. It would probably be tricky to >>>>>>>>> piece >>>>>>>>> them together to run the benchmark considering the PoC status. We >>>>>>>>> also need >>>>>>>>> the column stitching capability on the read path to test the column >>>>>>>>> file >>>>>>>>> approach. >>>>>>>>> >>>>>>>>> On Tue, Feb 3, 2026 at 1:53 PM Anoop Johnson <[email protected]> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> I'm in favor of co-located DV metadata with column file override >>>>>>>>>> and not doing affiliated/unaffiliated delete manifests. This is >>>>>>>>>> conceptually similar to strictly affiliated delete manifests with >>>>>>>>>> positional joins, and will halve the number of I/Os when there is no >>>>>>>>>> DV >>>>>>>>>> column override. It is simpler to implement >>>>>>>>>> and will speed up reads. >>>>>>>>>> >>>>>>>>>> Unaffiliated DV manifests are flexible for writers. They reduce >>>>>>>>>> the chance of physical conflicts when there are concurrent >>>>>>>>>> large/random >>>>>>>>>> deletes that change DVs on different files in the same manifest. But >>>>>>>>>> the >>>>>>>>>> flexibility comes at a read-time cost. If the number of unaffiliated >>>>>>>>>> DVs >>>>>>>>>> exceeds a threshold, it could cause driver OOMs or require >>>>>>>>>> distributed join >>>>>>>>>> to pair up DVs with data files. With colocated metadata, manifest >>>>>>>>>> DVs can >>>>>>>>>> reduce the chance of conflicts up to a certain write size. >>>>>>>>>> >>>>>>>>>> I assume we will still support unaffiliated manifests for >>>>>>>>>> equality deletes, but perhaps we can restrict it to just equality >>>>>>>>>> deletes. >>>>>>>>>> >>>>>>>>>> -Anoop >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Mon, Feb 2, 2026 at 4:27 PM Anton Okolnychyi < >>>>>>>>>> [email protected]> wrote: >>>>>>>>>> >>>>>>>>>>> I added the approach with column files to the doc. >>>>>>>>>>> >>>>>>>>>>> To sum up, separate data and delete manifests with affinity >>>>>>>>>>> would perform somewhat on par with co-located DV metadata (a.k.a. >>>>>>>>>>> direct >>>>>>>>>>> assignment) if we add support for column files when we need to >>>>>>>>>>> replace most >>>>>>>>>>> or all DVs (use case 1). That said, the support for direct >>>>>>>>>>> assignment with >>>>>>>>>>> in-line metadata DVs can help us avoid unaffiliated delete >>>>>>>>>>> manifests when >>>>>>>>>>> we need to replace a few DVs (use case 2). >>>>>>>>>>> >>>>>>>>>>> So the key question is whether we want to allow >>>>>>>>>>> unaffiliated delete manifests with DVs... If we don't, then we >>>>>>>>>>> would likely >>>>>>>>>>> want to have co-located DV metadata and must support efficient >>>>>>>>>>> column >>>>>>>>>>> updates not to regress compared to V2 and V3 for large MERGE jobs >>>>>>>>>>> that >>>>>>>>>>> modify a small set of records for most files. >>>>>>>>>>> >>>>>>>>>>> пн, 2 лют. 2026 р. о 13:20 Anton Okolnychyi < >>>>>>>>>>> [email protected]> пише: >>>>>>>>>>> >>>>>>>>>>>> Anoop, correct, if we keep data and delete manifests separate, >>>>>>>>>>>> there is a better way to combine the entries and we should NOT >>>>>>>>>>>> rely on the >>>>>>>>>>>> referenced data file path. Reconciling by implicit position will >>>>>>>>>>>> reduce the >>>>>>>>>>>> size of the DV entry (no need to store the referenced data file >>>>>>>>>>>> path) and >>>>>>>>>>>> will improve the planning performance (no equals/hashCode on the >>>>>>>>>>>> path). >>>>>>>>>>>> >>>>>>>>>>>> Steven, I agree. Most notes in the doc pre-date discussions we >>>>>>>>>>>> had on column updates. You are right, given that we are >>>>>>>>>>>> gravitating towards >>>>>>>>>>>> a native way to handle column updates, it seems logical to use the >>>>>>>>>>>> same >>>>>>>>>>>> approach for replacing DVs, since they’re essentially column >>>>>>>>>>>> updates. Let >>>>>>>>>>>> me add one more approach to the doc based on what Anurag and Peter >>>>>>>>>>>> have so >>>>>>>>>>>> far. >>>>>>>>>>>> >>>>>>>>>>>> нд, 1 лют. 2026 р. о 20:59 Steven Wu <[email protected]> >>>>>>>>>>>> пише: >>>>>>>>>>>> >>>>>>>>>>>>> Anton, thanks for raising this. I agree this deserves another >>>>>>>>>>>>> look. I added a comment in your doc that we can potentially apply >>>>>>>>>>>>> the >>>>>>>>>>>>> column update proposal for data file update to the manifest file >>>>>>>>>>>>> updates as >>>>>>>>>>>>> well, to colocate the data DV and data manifest files. Data DVs >>>>>>>>>>>>> can be a >>>>>>>>>>>>> separate column in the data manifest file and updated separately >>>>>>>>>>>>> in a >>>>>>>>>>>>> column file. This is the same as the coalesced positional join >>>>>>>>>>>>> that Anoop >>>>>>>>>>>>> mentioned. >>>>>>>>>>>>> >>>>>>>>>>>>> On Sun, Feb 1, 2026 at 4:14 PM Anoop Johnson <[email protected]> >>>>>>>>>>>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> Thank you for raising this, Anton. I had a similar >>>>>>>>>>>>>> observation while prototyping >>>>>>>>>>>>>> <https://github.com/apache/iceberg/pull/14533> the >>>>>>>>>>>>>> adaptive metadata tree. The overhead of doing a path-based hash >>>>>>>>>>>>>> join of a >>>>>>>>>>>>>> data manifest with the affiliated delete manifest is high: my >>>>>>>>>>>>>> estimate was >>>>>>>>>>>>>> that the join adds about 5-10% overhead. The hash table >>>>>>>>>>>>>> build/probe alone >>>>>>>>>>>>>> takes about 5 ms for manifests with 25K entries. There are >>>>>>>>>>>>>> engines that can >>>>>>>>>>>>>> do vectorized hash joins that can lower this, but the overhead >>>>>>>>>>>>>> and >>>>>>>>>>>>>> complexity of a SIMD-friendly hash join is non-trivial. >>>>>>>>>>>>>> >>>>>>>>>>>>>> An alternative to relying on the external file feature in >>>>>>>>>>>>>> Parquet, is to make affiliated manifests order-preserving: ie >>>>>>>>>>>>>> DVs in an >>>>>>>>>>>>>> affiliated delete manifest must appear in the same position as >>>>>>>>>>>>>> the >>>>>>>>>>>>>> corresponding data file in the data manifest the delete manifest >>>>>>>>>>>>>> is >>>>>>>>>>>>>> affiliated to. If a data file does not have a DV, the DV >>>>>>>>>>>>>> manifest must >>>>>>>>>>>>>> store a NULL. This would allow us to do positional joins, which >>>>>>>>>>>>>> are much >>>>>>>>>>>>>> faster. If we wanted, we could even have multiple affiliated DV >>>>>>>>>>>>>> manifests >>>>>>>>>>>>>> for a data manifest and the reader would do a COALESCED >>>>>>>>>>>>>> positional join >>>>>>>>>>>>>> (i.e. pick the first non-null value as the DV). It puts the >>>>>>>>>>>>>> sorting >>>>>>>>>>>>>> responsibility to the writers, but it might be a reasonable >>>>>>>>>>>>>> tradeoff. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Also, the options don't necessarily have to be mutually >>>>>>>>>>>>>> exclusive. We could still allow affiliated DVs to be "folded" >>>>>>>>>>>>>> into data >>>>>>>>>>>>>> manifest (e.g. by background optimization jobs or the writer >>>>>>>>>>>>>> itself). That >>>>>>>>>>>>>> might be the optimal choice for read-heavy tables because it >>>>>>>>>>>>>> will halve the >>>>>>>>>>>>>> number of I/Os readers have to make. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Best, >>>>>>>>>>>>>> Anoop >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Fri, Jan 30, 2026 at 6:03 PM Anton Okolnychyi < >>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> I had a chance to catch up on some of the V4 discussions. >>>>>>>>>>>>>>> Given that we are getting rid of the manifest list and >>>>>>>>>>>>>>> switching to >>>>>>>>>>>>>>> Parquet, I wanted to re-evaluate the possibility of direct DV >>>>>>>>>>>>>>> assignment >>>>>>>>>>>>>>> that we discarded in V3 to avoid regressions. I have put >>>>>>>>>>>>>>> together my >>>>>>>>>>>>>>> thoughts in a doc [1]. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> TL;DR: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> - I think the current V4 proposal that keeps data and delete >>>>>>>>>>>>>>> manifests separate but introduces affinity is a solid choice >>>>>>>>>>>>>>> for cases when >>>>>>>>>>>>>>> we need to replace DVs in many / most files. I outlined an >>>>>>>>>>>>>>> approach with >>>>>>>>>>>>>>> column-split Parquet files but it doesn't improve the >>>>>>>>>>>>>>> performance and takes >>>>>>>>>>>>>>> dependency on a portion of the Parquet spec that is not really >>>>>>>>>>>>>>> implemented. >>>>>>>>>>>>>>> - Pushing unaffiliated DVs directly into the root to replace >>>>>>>>>>>>>>> a small set of DVs is going to be fast on write but does >>>>>>>>>>>>>>> require resolving >>>>>>>>>>>>>>> where those DVs apply at read time. Using inline metadata DVs >>>>>>>>>>>>>>> with >>>>>>>>>>>>>>> column-split Parquet files is a little more promising in this >>>>>>>>>>>>>>> case as it >>>>>>>>>>>>>>> allows to avoid unaffiliated DVs. That said, it again relies on >>>>>>>>>>>>>>> something >>>>>>>>>>>>>>> Parquet doesn't implement right now, requires changing >>>>>>>>>>>>>>> maintenance >>>>>>>>>>>>>>> operations, and yields minimal benefits. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> All in all, the V4 proposal seems like a strict improvement >>>>>>>>>>>>>>> over V3 but I insist that we reconsider usage of the referenced >>>>>>>>>>>>>>> data file >>>>>>>>>>>>>>> path when resolving DVs to data files. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> [1] - >>>>>>>>>>>>>>> https://docs.google.com/document/d/1jZy4g6UDi3hdblpkSzDnqgzgATFKFoMaHmt4nNH8M7o >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> - Anton >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> сб, 22 лист. 2025 р. о 13:37 Amogh Jahagirdar < >>>>>>>>>>>>>>> [email protected]> пише: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Hey all, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Here is the meeting recording >>>>>>>>>>>>>>>> <https://drive.google.com/file/d/1lG9sM-JTwqcIgk7JsAryXXCc1vMnstJs/view?usp=sharing> >>>>>>>>>>>>>>>> and generated meeting summary >>>>>>>>>>>>>>>> <https://docs.google.com/document/d/1e50p8TXL2e3CnUwKMOvm8F4s2PeVMiKWHPxhxOW1fIM/edit?usp=sharing>. >>>>>>>>>>>>>>>> Thanks all for attending yesterday! >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Thu, Nov 20, 2025 at 8:49 AM Amogh Jahagirdar < >>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Hey folks, >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> I was out for some time, but set up a sync for tomorrow at >>>>>>>>>>>>>>>>> 9am PST. For this discussion, I do think it would be great to >>>>>>>>>>>>>>>>> focus on the >>>>>>>>>>>>>>>>> manifest DV representation, factoring in analyses on bitmap >>>>>>>>>>>>>>>>> representation >>>>>>>>>>>>>>>>> storage footprints, and the entry structure considering how >>>>>>>>>>>>>>>>> we want to >>>>>>>>>>>>>>>>> approach change detection. If there are other topics that >>>>>>>>>>>>>>>>> people want to >>>>>>>>>>>>>>>>> highlight, please do bring those up as well! >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> I also recognize that this is a bit short term scheduling, >>>>>>>>>>>>>>>>> so please do reach out to me if this time is difficult to >>>>>>>>>>>>>>>>> work with; next >>>>>>>>>>>>>>>>> week is the Thanksgiving holidays here, and since people >>>>>>>>>>>>>>>>> would be >>>>>>>>>>>>>>>>> travelling/out I figured I'd try to schedule before then. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>> Amogh Jahagirdar >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On Fri, Oct 17, 2025 at 9:03 AM Amogh Jahagirdar < >>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Hey folks, >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Sorry for the delay, here's the recording link >>>>>>>>>>>>>>>>>> <https://drive.google.com/file/d/1YOmPROXjAKYAWAcYxqAFHdADbqELVVf2/view> >>>>>>>>>>>>>>>>>> from >>>>>>>>>>>>>>>>>> last week's discussion. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>> Amogh Jahagirdar >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> On Fri, Oct 10, 2025 at 9:44 AM Péter Váry < >>>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Same here. >>>>>>>>>>>>>>>>>>> Please record if you can. >>>>>>>>>>>>>>>>>>> Thanks, Peter >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> On Fri, Oct 10, 2025, 17:39 Fokko Driesprong < >>>>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Hey Amogh, >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Thanks for the write-up. Unfortunately, I won’t be able >>>>>>>>>>>>>>>>>>>> to attend. Will it be recorded? Thanks! >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Kind regards, >>>>>>>>>>>>>>>>>>>> Fokko >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Op di 7 okt 2025 om 20:36 schreef Amogh Jahagirdar < >>>>>>>>>>>>>>>>>>>> [email protected]> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Hey all, >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> I've setup time this Friday at 9am PST for another >>>>>>>>>>>>>>>>>>>>> sync on single file commits. In terms of what would be >>>>>>>>>>>>>>>>>>>>> great to focus on >>>>>>>>>>>>>>>>>>>>> for the discussion: >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> 1. Whether it makes sense or not to eliminate the >>>>>>>>>>>>>>>>>>>>> tuple, and instead representing the tuple via lower/upper >>>>>>>>>>>>>>>>>>>>> boundaries. As a >>>>>>>>>>>>>>>>>>>>> reminder, one of the goals is to avoid tying a partition >>>>>>>>>>>>>>>>>>>>> spec to a >>>>>>>>>>>>>>>>>>>>> manifest; in the root we can have a mix of files spanning >>>>>>>>>>>>>>>>>>>>> different >>>>>>>>>>>>>>>>>>>>> partition specs, and even in leaf manifests avoiding this >>>>>>>>>>>>>>>>>>>>> coupling can >>>>>>>>>>>>>>>>>>>>> enable more desirable clustering of metadata. >>>>>>>>>>>>>>>>>>>>> In the vast majority of cases, we could leverage the >>>>>>>>>>>>>>>>>>>>> property that a file is effectively partitioned if the >>>>>>>>>>>>>>>>>>>>> lower/upper for a >>>>>>>>>>>>>>>>>>>>> given field is equal. The nuance here is with the >>>>>>>>>>>>>>>>>>>>> particular case of >>>>>>>>>>>>>>>>>>>>> identity partitioned string/binary columns which can be >>>>>>>>>>>>>>>>>>>>> truncated in stats. >>>>>>>>>>>>>>>>>>>>> One approach is to require that writers must not produce >>>>>>>>>>>>>>>>>>>>> truncated stats >>>>>>>>>>>>>>>>>>>>> for identity partitioned columns. It's also important to >>>>>>>>>>>>>>>>>>>>> keep in mind that >>>>>>>>>>>>>>>>>>>>> all of this is just for the purpose of reconstructing the >>>>>>>>>>>>>>>>>>>>> partition tuple, >>>>>>>>>>>>>>>>>>>>> which is only required during equality delete matching. >>>>>>>>>>>>>>>>>>>>> Another area we >>>>>>>>>>>>>>>>>>>>> need to cover as part of this is on exact bounds on >>>>>>>>>>>>>>>>>>>>> stats. There are other >>>>>>>>>>>>>>>>>>>>> options here as well such as making all new equality >>>>>>>>>>>>>>>>>>>>> deletes in V4 be >>>>>>>>>>>>>>>>>>>>> global and instead match based on bounds, or keeping the >>>>>>>>>>>>>>>>>>>>> tuple but each >>>>>>>>>>>>>>>>>>>>> tuple is effectively based off a union schema of all >>>>>>>>>>>>>>>>>>>>> partition specs. I am >>>>>>>>>>>>>>>>>>>>> adding a separate appendix section outlining the span of >>>>>>>>>>>>>>>>>>>>> options here and >>>>>>>>>>>>>>>>>>>>> the different tradeoffs. >>>>>>>>>>>>>>>>>>>>> Once we get this more to a conclusive state, I'll move >>>>>>>>>>>>>>>>>>>>> a summarized version to the main doc. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> 2. @[email protected] <[email protected]> has >>>>>>>>>>>>>>>>>>>>> updated the doc with a section >>>>>>>>>>>>>>>>>>>>> <https://docs.google.com/document/d/1k4x8utgh41Sn1tr98eynDKCWq035SV_f75rtNHcerVw/edit?tab=t.rrpksmp8zkb#heading=h.qau0y5xkh9mn> >>>>>>>>>>>>>>>>>>>>> on >>>>>>>>>>>>>>>>>>>>> how we can do change detection from the root in a variety >>>>>>>>>>>>>>>>>>>>> of write >>>>>>>>>>>>>>>>>>>>> scenarios. I've done a review on it, and it covers the >>>>>>>>>>>>>>>>>>>>> cases I would >>>>>>>>>>>>>>>>>>>>> expect. It'd be good for folks to take a look and please >>>>>>>>>>>>>>>>>>>>> give feedback >>>>>>>>>>>>>>>>>>>>> before we discuss. Thank you Steven for adding that >>>>>>>>>>>>>>>>>>>>> section and all the >>>>>>>>>>>>>>>>>>>>> diagrams. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>>>> Amogh Jahagirdar >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> On Thu, Sep 18, 2025 at 3:19 PM Amogh Jahagirdar < >>>>>>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Hey folks just following up from the discussion last >>>>>>>>>>>>>>>>>>>>>> Friday with a summary and some next steps: >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> 1.) For the various change detection cases, we >>>>>>>>>>>>>>>>>>>>>> concluded it's best just to go through those in an >>>>>>>>>>>>>>>>>>>>>> offline manner on the >>>>>>>>>>>>>>>>>>>>>> doc since it's hard to verify all that correctness in a >>>>>>>>>>>>>>>>>>>>>> large meeting >>>>>>>>>>>>>>>>>>>>>> setting. >>>>>>>>>>>>>>>>>>>>>> 2.) We mostly discussed eliminating the >>>>>>>>>>>>>>>>>>>>>> partition tuple. On the original proposal, I was mostly >>>>>>>>>>>>>>>>>>>>>> aiming for the >>>>>>>>>>>>>>>>>>>>>> ability to re-constructing the tuple from the stats for >>>>>>>>>>>>>>>>>>>>>> the purpose of >>>>>>>>>>>>>>>>>>>>>> equality delete matching (a file is partitioned if the >>>>>>>>>>>>>>>>>>>>>> lower and upper >>>>>>>>>>>>>>>>>>>>>> bounds are equal); There's some nuance in how we need to >>>>>>>>>>>>>>>>>>>>>> handle identity >>>>>>>>>>>>>>>>>>>>>> partition values since for string/binary they cannot be >>>>>>>>>>>>>>>>>>>>>> truncated. >>>>>>>>>>>>>>>>>>>>>> Another potential option is to treat all equality >>>>>>>>>>>>>>>>>>>>>> deletes as effectively >>>>>>>>>>>>>>>>>>>>>> global and narrow their application based on the stats >>>>>>>>>>>>>>>>>>>>>> values. This may >>>>>>>>>>>>>>>>>>>>>> require defining tight bounds. I'm still collecting my >>>>>>>>>>>>>>>>>>>>>> thoughts on this one. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Thanks folks! Please also let me know if any of the >>>>>>>>>>>>>>>>>>>>>> following links are inaccessible for any reason. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Meeting recording link: >>>>>>>>>>>>>>>>>>>>>> https://drive.google.com/file/d/1gv8TrR5xzqqNxek7_sTZkpbwQx1M3dhK/view >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Meeting summary: >>>>>>>>>>>>>>>>>>>>>> https://docs.google.com/document/d/131N0CDpzZczURxitN0HGS7dTqRxQT_YS9jMECkGGvQU >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> On Mon, Sep 8, 2025 at 3:40 PM Amogh Jahagirdar < >>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> Update: I moved the discussion time to this Friday >>>>>>>>>>>>>>>>>>>>>>> at 9 am PST since I found out that quite a few folks >>>>>>>>>>>>>>>>>>>>>>> involved in the >>>>>>>>>>>>>>>>>>>>>>> proposals will be out next week, and I also know some >>>>>>>>>>>>>>>>>>>>>>> folks will also be >>>>>>>>>>>>>>>>>>>>>>> out the week after that. >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>>>>>> Amogh J >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> On Mon, Sep 8, 2025 at 8:57 AM Amogh Jahagirdar < >>>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> Hey folks sorry for the late follow up here, >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> Thanks @Kevin Liu <[email protected]> for >>>>>>>>>>>>>>>>>>>>>>>> sharing the recording link of the previous discussion! >>>>>>>>>>>>>>>>>>>>>>>> I've set up another >>>>>>>>>>>>>>>>>>>>>>>> sync for next Tuesday 09/16 at 9am PST. This time I've >>>>>>>>>>>>>>>>>>>>>>>> set it up from my >>>>>>>>>>>>>>>>>>>>>>>> corporate email so we can get recordings and >>>>>>>>>>>>>>>>>>>>>>>> transcriptions (and I've made >>>>>>>>>>>>>>>>>>>>>>>> sure to keep the meeting invite open so we don't have >>>>>>>>>>>>>>>>>>>>>>>> to manually let >>>>>>>>>>>>>>>>>>>>>>>> people in). >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> In terms of next steps of areas which I think would >>>>>>>>>>>>>>>>>>>>>>>> be good to focus on for establishing consensus: >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> 1. How do we model the manifest entry structure >>>>>>>>>>>>>>>>>>>>>>>> so that changes to manifest DVs can be obtained easily >>>>>>>>>>>>>>>>>>>>>>>> from the root? There >>>>>>>>>>>>>>>>>>>>>>>> are a few options here; the most promising approach is >>>>>>>>>>>>>>>>>>>>>>>> to keep an >>>>>>>>>>>>>>>>>>>>>>>> additional DV which encodes the diff in additional >>>>>>>>>>>>>>>>>>>>>>>> positions which have >>>>>>>>>>>>>>>>>>>>>>>> been removed from a leaf manifest. >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> 2. Modeling partition transforms via expressions >>>>>>>>>>>>>>>>>>>>>>>> and establishing a unified table ID space so that we >>>>>>>>>>>>>>>>>>>>>>>> can simplify how >>>>>>>>>>>>>>>>>>>>>>>> partition tuples may be represented via stats and also >>>>>>>>>>>>>>>>>>>>>>>> have a way in the >>>>>>>>>>>>>>>>>>>>>>>> future to store stats on any derived column. I have a >>>>>>>>>>>>>>>>>>>>>>>> short >>>>>>>>>>>>>>>>>>>>>>>> proposal >>>>>>>>>>>>>>>>>>>>>>>> <https://docs.google.com/document/d/1oV8dapKVzB4pZy5pKHUCj5j9i2_1p37BJSeT7hyKPpg/edit?tab=t.0> >>>>>>>>>>>>>>>>>>>>>>>> for >>>>>>>>>>>>>>>>>>>>>>>> this that probably still needs some tightening up on >>>>>>>>>>>>>>>>>>>>>>>> the expression >>>>>>>>>>>>>>>>>>>>>>>> modeling itself (and some prototyping) but the general >>>>>>>>>>>>>>>>>>>>>>>> idea for >>>>>>>>>>>>>>>>>>>>>>>> establishing a unified table ID space is covered. All >>>>>>>>>>>>>>>>>>>>>>>> feedback welcome! >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> Amogh Jahagirdar >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> On Mon, Aug 25, 2025 at 1:34 PM Kevin Liu < >>>>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> Thanks Amogh. Looks like the recording for last >>>>>>>>>>>>>>>>>>>>>>>>> week's sync is available on Youtube. Here's the link, >>>>>>>>>>>>>>>>>>>>>>>>> https://www.youtube.com/watch?v=uWm-p--8oVQ >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> Best, >>>>>>>>>>>>>>>>>>>>>>>>> Kevin Liu >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> On Tue, Aug 12, 2025 at 9:10 PM Amogh Jahagirdar < >>>>>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> Hey folks, >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> Just following up on this to give the community >>>>>>>>>>>>>>>>>>>>>>>>>> as to where we're at and my proposed next steps. >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> I've been editing and merging the contents from >>>>>>>>>>>>>>>>>>>>>>>>>> our proposal into the proposal >>>>>>>>>>>>>>>>>>>>>>>>>> <https://docs.google.com/document/d/1k4x8utgh41Sn1tr98eynDKCWq035SV_f75rtNHcerVw/edit?tab=t.0#heading=h.unn922df0zzw> >>>>>>>>>>>>>>>>>>>>>>>>>> from >>>>>>>>>>>>>>>>>>>>>>>>>> Russell and others. For any future comments on docs, >>>>>>>>>>>>>>>>>>>>>>>>>> please comment on the >>>>>>>>>>>>>>>>>>>>>>>>>> linked proposal. I've also marked it on our doc in >>>>>>>>>>>>>>>>>>>>>>>>>> red text so it's clear >>>>>>>>>>>>>>>>>>>>>>>>>> to redirect to the other proposal as a source of >>>>>>>>>>>>>>>>>>>>>>>>>> truth for comments. >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> In terms of next steps, >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> 1. An important design decision point is around >>>>>>>>>>>>>>>>>>>>>>>>>> inline manifest DVs, external manifest DVs or >>>>>>>>>>>>>>>>>>>>>>>>>> enabling both. I'm working on >>>>>>>>>>>>>>>>>>>>>>>>>> measuring different approaches for representing the >>>>>>>>>>>>>>>>>>>>>>>>>> compressed DV >>>>>>>>>>>>>>>>>>>>>>>>>> representation since that will inform how many >>>>>>>>>>>>>>>>>>>>>>>>>> entries can reasonably fit >>>>>>>>>>>>>>>>>>>>>>>>>> in a small root manifest; from that we can derive >>>>>>>>>>>>>>>>>>>>>>>>>> implications on different >>>>>>>>>>>>>>>>>>>>>>>>>> write patterns and determine the right approach for >>>>>>>>>>>>>>>>>>>>>>>>>> storing these manifest >>>>>>>>>>>>>>>>>>>>>>>>>> DVs. >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> 2. Another key point is around determining if/how >>>>>>>>>>>>>>>>>>>>>>>>>> we can reasonably enable V4 to represent changes in >>>>>>>>>>>>>>>>>>>>>>>>>> the root manifest so >>>>>>>>>>>>>>>>>>>>>>>>>> that readers can effectively just infer file level >>>>>>>>>>>>>>>>>>>>>>>>>> changes from the root. >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> 3. One of the aspects of the proposal is getting >>>>>>>>>>>>>>>>>>>>>>>>>> away from partition tuple requirement in the root >>>>>>>>>>>>>>>>>>>>>>>>>> which currently holds us >>>>>>>>>>>>>>>>>>>>>>>>>> to have associativity between a partition spec and a >>>>>>>>>>>>>>>>>>>>>>>>>> manifest. These >>>>>>>>>>>>>>>>>>>>>>>>>> aspects can be modeled as essentially column stats >>>>>>>>>>>>>>>>>>>>>>>>>> which gives a lot of >>>>>>>>>>>>>>>>>>>>>>>>>> flexibility into the organization of the manifest. >>>>>>>>>>>>>>>>>>>>>>>>>> There are important >>>>>>>>>>>>>>>>>>>>>>>>>> details around field ID spaces here which tie into >>>>>>>>>>>>>>>>>>>>>>>>>> how the stats are >>>>>>>>>>>>>>>>>>>>>>>>>> structured. What we're proposing here is to have a >>>>>>>>>>>>>>>>>>>>>>>>>> unified expression ID >>>>>>>>>>>>>>>>>>>>>>>>>> space that could also benefit us for storing things >>>>>>>>>>>>>>>>>>>>>>>>>> like virtual columns >>>>>>>>>>>>>>>>>>>>>>>>>> down the line. I go into this in the proposal but >>>>>>>>>>>>>>>>>>>>>>>>>> I'm working on separating >>>>>>>>>>>>>>>>>>>>>>>>>> the appropriate parts so that the original proposal >>>>>>>>>>>>>>>>>>>>>>>>>> can mostly just focus >>>>>>>>>>>>>>>>>>>>>>>>>> on the organization of the content metadata tree and >>>>>>>>>>>>>>>>>>>>>>>>>> not how we want to >>>>>>>>>>>>>>>>>>>>>>>>>> solve this particular ID space problem. >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> 4. I'm planning on scheduling a recurring >>>>>>>>>>>>>>>>>>>>>>>>>> community sync starting next Tuesday at 9am PST, >>>>>>>>>>>>>>>>>>>>>>>>>> every 2 weeks. If I get >>>>>>>>>>>>>>>>>>>>>>>>>> feedback from folks that this time will never work, >>>>>>>>>>>>>>>>>>>>>>>>>> I can certainly adjust. >>>>>>>>>>>>>>>>>>>>>>>>>> For some reason, I don't have the ability to add to >>>>>>>>>>>>>>>>>>>>>>>>>> the Iceberg Dev >>>>>>>>>>>>>>>>>>>>>>>>>> calendar, so I'll figure that out and update the >>>>>>>>>>>>>>>>>>>>>>>>>> thread when the event is >>>>>>>>>>>>>>>>>>>>>>>>>> scheduled. >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> Amogh Jahagirdar >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> On Tue, Jul 22, 2025 at 11:47 AM Russell Spitzer < >>>>>>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> I think this is a great way forward, starting >>>>>>>>>>>>>>>>>>>>>>>>>>> out with this much parallel development shows that >>>>>>>>>>>>>>>>>>>>>>>>>>> we have a lot of >>>>>>>>>>>>>>>>>>>>>>>>>>> consensus already :) >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> On Tue, Jul 22, 2025 at 12:42 PM Amogh >>>>>>>>>>>>>>>>>>>>>>>>>>> Jahagirdar <[email protected]> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> Hey folks, just following up on this. It looks >>>>>>>>>>>>>>>>>>>>>>>>>>>> like our proposal and the proposal that @Russell >>>>>>>>>>>>>>>>>>>>>>>>>>>> Spitzer <[email protected]> shared are >>>>>>>>>>>>>>>>>>>>>>>>>>>> pretty aligned. I was just chatting with Russell >>>>>>>>>>>>>>>>>>>>>>>>>>>> about this, and we think >>>>>>>>>>>>>>>>>>>>>>>>>>>> it'd be best to combine both proposals and have a >>>>>>>>>>>>>>>>>>>>>>>>>>>> singular large effort on >>>>>>>>>>>>>>>>>>>>>>>>>>>> this. I can also set up a focused community >>>>>>>>>>>>>>>>>>>>>>>>>>>> discussion (similar to what >>>>>>>>>>>>>>>>>>>>>>>>>>>> we're doing on the other V4 proposals) on this >>>>>>>>>>>>>>>>>>>>>>>>>>>> starting sometime next week >>>>>>>>>>>>>>>>>>>>>>>>>>>> just to get things moving, if that works for >>>>>>>>>>>>>>>>>>>>>>>>>>>> people. >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> Amogh Jahagirdar >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> On Mon, Jul 14, 2025 at 9:48 PM Amogh >>>>>>>>>>>>>>>>>>>>>>>>>>>> Jahagirdar <[email protected]> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hey Russell, >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks for sharing the proposal! A few of us >>>>>>>>>>>>>>>>>>>>>>>>>>>>> (Ryan, Dan, Anoop and I) have also been working >>>>>>>>>>>>>>>>>>>>>>>>>>>>> on a proposal for an >>>>>>>>>>>>>>>>>>>>>>>>>>>>> adaptive metadata tree structure as part of >>>>>>>>>>>>>>>>>>>>>>>>>>>>> enabling more efficient one >>>>>>>>>>>>>>>>>>>>>>>>>>>>> file commits. From a read of the summary, it's >>>>>>>>>>>>>>>>>>>>>>>>>>>>> great to see that we're >>>>>>>>>>>>>>>>>>>>>>>>>>>>> thinking along the same lines about how to tackle >>>>>>>>>>>>>>>>>>>>>>>>>>>>> this fundamental area! >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Here is our proposal: >>>>>>>>>>>>>>>>>>>>>>>>>>>>> https://docs.google.com/document/d/1q2asTpq471pltOTC6AsTLQIQcgEsh0AvEhRWnCcvZn0 >>>>>>>>>>>>>>>>>>>>>>>>>>>>> <https://docs.google.com/document/d/1q2asTpq471pltOTC6AsTLQIQcgEsh0AvEhRWnCcvZn0> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Amogh Jahagirdar >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Mon, Jul 14, 2025 at 8:08 PM Russell >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Spitzer <[email protected]> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hey y'all! >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> We (Yi Fang, Steven Wu and Myself) wanted to >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> share some >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> of the thoughts we had on how one-file >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> commits could work in Iceberg. This is pretty >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> much just a high level overview of the >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> concepts we think we need and how Iceberg would >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> behave. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> We haven't gone very far into the actual >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> implementation and changes that would need to >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> occur in the >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> SDK to make this happen. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> The high level summary is: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Manifest Lists are out >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Root Manifests take their place >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> A Root manifest can have data manifests, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> delete manifests, manifest delete vectors, data >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> delete vectors and data >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> files >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Manifest delete vectors allow for modifying >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> a manifest without deleting it entirely >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Data files let you append without writing >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> an intermediary manifest >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Having child data and delete manifests lets >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> you still scale >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Please take a look if you like, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> https://docs.google.com/document/d/1k4x8utgh41Sn1tr98eynDKCWq035SV_f75rtNHcerVw/edit?tab=t.0 >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I'm excited to see what other proposals and >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Ideas are floating around the community, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Russ >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, Jul 2, 2025 at 6:29 PM John Zhuge < >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Very excited about the idea! >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, Jul 2, 2025 at 1:17 PM Anoop Johnson >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> <[email protected]> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I'm very interested in this initiative. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Micah Kornfield and I presented >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> <https://youtu.be/4d4nqKkANdM?si=9TXgaUIXbq-l8idi&t=1405> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> on high-throughput ingestion for Iceberg >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> tables at the 2024 Iceberg Summit, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> which leveraged Google infrastructure like >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Colossus for efficient appends. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> This new proposal is particularly exciting >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> because it offers significant advancements in >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> commit latency and metadata >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> storage footprint. Furthermore, a consistent >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> manifest structure promises to >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> simplify the design and codebase, which is a >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> major benefit. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> A related idea I've been exploring is >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> having a loose affinity between data and >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> delete manifests. While the >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> current separation of data and delete >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> manifests in Iceberg is valuable for >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> avoiding data file rewrites (and stats >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> updates) when deletes change, it >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> does necessitate a join operation during >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> reads. I'd be keen to discuss >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> approaches that could potentially reduce this >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> read-side cost while >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> retaining the benefits of separate manifests. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Anoop >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Fri, Jun 13, 2025 at 11:06 AM Jagdeep >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Sidhu <[email protected]> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi everyone, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I am new to the Iceberg community but >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> would love to participate in these >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> discussions to reduce the number of file >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> writes, especially for small writes/commits. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thank you! >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> -Jagdeep >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, Jun 5, 2025 at 4:02 PM Anurag >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Mantripragada >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> <[email protected]> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> We have been hitting all the metadata >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> problems you mentioned, Ryan. I’m on-board >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to help however I can to improve >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> this area. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ~ Anurag Mantripragada >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Jun 3, 2025, at 2:22 AM, Huang-Hsiang >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Cheng <[email protected]> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I am interested in this idea and looking >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> forward to collaboration. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Huang-Hsiang >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Jun 2, 2025, at 10:14 AM, namratha mk < >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hello, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I am interested in contributing to this >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> effort. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Namratha >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, May 29, 2025 at 1:36 PM Amogh >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Jahagirdar <[email protected]> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks for kicking this thread off Ryan, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I'm interested in helping out here! I've >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> been working on a proposal in this >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> area and it would be great to collaborate >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> with different folks and exchange >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ideas here, since I think a lot of people >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> are interested in solving this >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> problem. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Amogh Jahagirdar >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, May 29, 2025 at 2:25 PM Ryan >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Blue <[email protected]> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi everyone, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Like Russell’s recent note, I’m >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> starting a thread to connect those of us >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> that are interested in the idea of >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> changing Iceberg’s metadata in v4 so that >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> in most cases committing a change >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> only requires writing one additional >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> metadata file. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> *Idea: One-file commits* >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> The current Iceberg metadata structure >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> requires writing at least one manifest and >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> a new manifest list to produce a >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> new snapshot. The goal of this work is to >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> allow more flexibility by >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> allowing the manifest list layer to store >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> data and delete files. As a >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> result, only one file write would be >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> needed before committing the new >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> snapshot. In addition, this work will also >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> try to explore: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - Avoiding small manifests that >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> must be read in parallel and later >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> compacted (metadata maintenance changes) >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - Extend metadata skipping to use >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> aggregated column ranges that are >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> compatible with geospatial data (manifest >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> metadata) >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - Using soft deletes to avoid >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> rewriting existing manifests (metadata >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> DVs) >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> If you’re interested in these problems, >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> please reply! >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Ryan >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> John Zhuge >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
