Anton, thanks for raising this. I agree this deserves another look. I added a comment in your doc that we can potentially apply the column update proposal for data file update to the manifest file updates as well, to colocate the data DV and data manifest files. Data DVs can be a separate column in the data manifest file and updated separately in a column file. This is the same as the coalesced positional join that Anoop mentioned.
On Sun, Feb 1, 2026 at 4:14 PM Anoop Johnson <[email protected]> wrote: > Thank you for raising this, Anton. I had a similar observation while > prototyping <https://github.com/apache/iceberg/pull/14533> the > adaptive metadata tree. The overhead of doing a path-based hash join of a > data manifest with the affiliated delete manifest is high: my estimate was > that the join adds about 5-10% overhead. The hash table build/probe alone > takes about 5 ms for manifests with 25K entries. There are engines that can > do vectorized hash joins that can lower this, but the overhead and > complexity of a SIMD-friendly hash join is non-trivial. > > An alternative to relying on the external file feature in Parquet, is to > make affiliated manifests order-preserving: ie DVs in an affiliated delete > manifest must appear in the same position as the corresponding data file in > the data manifest the delete manifest is affiliated to. If a data file > does not have a DV, the DV manifest must store a NULL. This would allow us > to do positional joins, which are much faster. If we wanted, we could even > have multiple affiliated DV manifests for a data manifest and the reader > would do a COALESCED positional join (i.e. pick the first non-null value as > the DV). It puts the sorting responsibility to the writers, but it might be > a reasonable tradeoff. > > Also, the options don't necessarily have to be mutually exclusive. We > could still allow affiliated DVs to be "folded" into data manifest (e.g. by > background optimization jobs or the writer itself). That might be the > optimal choice for read-heavy tables because it will halve the number of > I/Os readers have to make. > > Best, > Anoop > > > On Fri, Jan 30, 2026 at 6:03 PM Anton Okolnychyi <[email protected]> > wrote: > >> I had a chance to catch up on some of the V4 discussions. Given that we >> are getting rid of the manifest list and switching to Parquet, I wanted to >> re-evaluate the possibility of direct DV assignment that we discarded in V3 >> to avoid regressions. I have put together my thoughts in a doc [1]. >> >> TL;DR: >> >> - I think the current V4 proposal that keeps data and delete manifests >> separate but introduces affinity is a solid choice for cases when we need >> to replace DVs in many / most files. I outlined an approach with >> column-split Parquet files but it doesn't improve the performance and takes >> dependency on a portion of the Parquet spec that is not really implemented. >> - Pushing unaffiliated DVs directly into the root to replace a small set >> of DVs is going to be fast on write but does require resolving where those >> DVs apply at read time. Using inline metadata DVs with column-split Parquet >> files is a little more promising in this case as it allows to avoid >> unaffiliated DVs. That said, it again relies on something Parquet doesn't >> implement right now, requires changing maintenance operations, and yields >> minimal benefits. >> >> All in all, the V4 proposal seems like a strict improvement over V3 but I >> insist that we reconsider usage of the referenced data file path when >> resolving DVs to data files. >> >> [1] - >> https://docs.google.com/document/d/1jZy4g6UDi3hdblpkSzDnqgzgATFKFoMaHmt4nNH8M7o >> >> - Anton >> >> сб, 22 лист. 2025 р. о 13:37 Amogh Jahagirdar <[email protected]> пише: >> >>> Hey all, >>> >>> Here is the meeting recording >>> <https://drive.google.com/file/d/1lG9sM-JTwqcIgk7JsAryXXCc1vMnstJs/view?usp=sharing> >>> and generated meeting summary >>> <https://docs.google.com/document/d/1e50p8TXL2e3CnUwKMOvm8F4s2PeVMiKWHPxhxOW1fIM/edit?usp=sharing>. >>> Thanks all for attending yesterday! >>> >>> On Thu, Nov 20, 2025 at 8:49 AM Amogh Jahagirdar <[email protected]> >>> wrote: >>> >>>> Hey folks, >>>> >>>> I was out for some time, but set up a sync for tomorrow at 9am PST. For >>>> this discussion, I do think it would be great to focus on the manifest DV >>>> representation, factoring in analyses on bitmap representation storage >>>> footprints, and the entry structure considering how we want to approach >>>> change detection. If there are other topics that people want to highlight, >>>> please do bring those up as well! >>>> >>>> I also recognize that this is a bit short term scheduling, so please do >>>> reach out to me if this time is difficult to work with; next week is the >>>> Thanksgiving holidays here, and since people would be travelling/out I >>>> figured I'd try to schedule before then. >>>> >>>> Thanks, >>>> Amogh Jahagirdar >>>> >>>> >>>> >>>> On Fri, Oct 17, 2025 at 9:03 AM Amogh Jahagirdar <[email protected]> >>>> wrote: >>>> >>>>> Hey folks, >>>>> >>>>> Sorry for the delay, here's the recording link >>>>> <https://drive.google.com/file/d/1YOmPROXjAKYAWAcYxqAFHdADbqELVVf2/view> >>>>> from >>>>> last week's discussion. >>>>> >>>>> Thanks, >>>>> Amogh Jahagirdar >>>>> >>>>> On Fri, Oct 10, 2025 at 9:44 AM Péter Váry < >>>>> [email protected]> wrote: >>>>> >>>>>> Same here. >>>>>> Please record if you can. >>>>>> Thanks, Peter >>>>>> >>>>>> On Fri, Oct 10, 2025, 17:39 Fokko Driesprong <[email protected]> >>>>>> wrote: >>>>>> >>>>>>> Hey Amogh, >>>>>>> >>>>>>> Thanks for the write-up. Unfortunately, I won’t be able to attend. >>>>>>> Will it be recorded? Thanks! >>>>>>> >>>>>>> Kind regards, >>>>>>> Fokko >>>>>>> >>>>>>> Op di 7 okt 2025 om 20:36 schreef Amogh Jahagirdar <[email protected] >>>>>>> > >>>>>>> >>>>>>>> Hey all, >>>>>>>> >>>>>>>> I've setup time this Friday at 9am PST for another sync on single >>>>>>>> file commits. In terms of what would be great to focus on for the >>>>>>>> discussion: >>>>>>>> >>>>>>>> 1. Whether it makes sense or not to eliminate the tuple, and >>>>>>>> instead representing the tuple via lower/upper boundaries. As a >>>>>>>> reminder, >>>>>>>> one of the goals is to avoid tying a partition spec to a manifest; in >>>>>>>> the >>>>>>>> root we can have a mix of files spanning different partition specs, and >>>>>>>> even in leaf manifests avoiding this coupling can enable more >>>>>>>> desirable clustering of metadata. >>>>>>>> In the vast majority of cases, we could leverage the property that >>>>>>>> a file is effectively partitioned if the lower/upper for a given field >>>>>>>> is >>>>>>>> equal. The nuance here is with the particular case of identity >>>>>>>> partitioned >>>>>>>> string/binary columns which can be truncated in stats. One approach is >>>>>>>> to >>>>>>>> require that writers must not produce truncated stats for identity >>>>>>>> partitioned columns. It's also important to keep in mind that all of >>>>>>>> this >>>>>>>> is just for the purpose of reconstructing the partition tuple, which is >>>>>>>> only required during equality delete matching. Another area we need to >>>>>>>> cover as part of this is on exact bounds on stats. There are other >>>>>>>> options >>>>>>>> here as well such as making all new equality deletes in V4 be global >>>>>>>> and >>>>>>>> instead match based on bounds, or keeping the tuple but each tuple is >>>>>>>> effectively based off a union schema of all partition specs. I am >>>>>>>> adding a >>>>>>>> separate appendix section outlining the span of options here and the >>>>>>>> different tradeoffs. >>>>>>>> Once we get this more to a conclusive state, I'll move a summarized >>>>>>>> version to the main doc. >>>>>>>> >>>>>>>> 2. @[email protected] <[email protected]> has updated the >>>>>>>> doc with a section >>>>>>>> <https://docs.google.com/document/d/1k4x8utgh41Sn1tr98eynDKCWq035SV_f75rtNHcerVw/edit?tab=t.rrpksmp8zkb#heading=h.qau0y5xkh9mn> >>>>>>>> on >>>>>>>> how we can do change detection from the root in a variety of write >>>>>>>> scenarios. I've done a review on it, and it covers the cases I would >>>>>>>> expect. It'd be good for folks to take a look and please give feedback >>>>>>>> before we discuss. Thank you Steven for adding that section and all the >>>>>>>> diagrams. >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Amogh Jahagirdar >>>>>>>> >>>>>>>> On Thu, Sep 18, 2025 at 3:19 PM Amogh Jahagirdar <[email protected]> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Hey folks just following up from the discussion last Friday with a >>>>>>>>> summary and some next steps: >>>>>>>>> >>>>>>>>> 1.) For the various change detection cases, we concluded it's best >>>>>>>>> just to go through those in an offline manner on the doc since it's >>>>>>>>> hard to >>>>>>>>> verify all that correctness in a large meeting setting. >>>>>>>>> 2.) We mostly discussed eliminating the partition tuple. On the >>>>>>>>> original proposal, I was mostly aiming for the ability to >>>>>>>>> re-constructing >>>>>>>>> the tuple from the stats for the purpose of equality delete matching >>>>>>>>> (a >>>>>>>>> file is partitioned if the lower and upper bounds are equal); There's >>>>>>>>> some >>>>>>>>> nuance in how we need to handle identity partition values since for >>>>>>>>> string/binary they cannot be truncated. Another potential option is to >>>>>>>>> treat all equality deletes as effectively global and narrow their >>>>>>>>> application based on the stats values. This may require defining tight >>>>>>>>> bounds. I'm still collecting my thoughts on this one. >>>>>>>>> >>>>>>>>> Thanks folks! Please also let me know if any of the following >>>>>>>>> links are inaccessible for any reason. >>>>>>>>> >>>>>>>>> Meeting recording link: >>>>>>>>> https://drive.google.com/file/d/1gv8TrR5xzqqNxek7_sTZkpbwQx1M3dhK/view >>>>>>>>> >>>>>>>>> Meeting summary: >>>>>>>>> https://docs.google.com/document/d/131N0CDpzZczURxitN0HGS7dTqRxQT_YS9jMECkGGvQU >>>>>>>>> >>>>>>>>> On Mon, Sep 8, 2025 at 3:40 PM Amogh Jahagirdar <[email protected]> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> Update: I moved the discussion time to this Friday at 9 am PST >>>>>>>>>> since I found out that quite a few folks involved in the proposals >>>>>>>>>> will be >>>>>>>>>> out next week, and I also know some folks will also be out the week >>>>>>>>>> after >>>>>>>>>> that. >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> Amogh J >>>>>>>>>> >>>>>>>>>> On Mon, Sep 8, 2025 at 8:57 AM Amogh Jahagirdar <[email protected]> >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> Hey folks sorry for the late follow up here, >>>>>>>>>>> >>>>>>>>>>> Thanks @Kevin Liu <[email protected]> for sharing the >>>>>>>>>>> recording link of the previous discussion! I've set up another sync >>>>>>>>>>> for >>>>>>>>>>> next Tuesday 09/16 at 9am PST. This time I've set it up from my >>>>>>>>>>> corporate >>>>>>>>>>> email so we can get recordings and transcriptions (and I've made >>>>>>>>>>> sure to >>>>>>>>>>> keep the meeting invite open so we don't have to manually let >>>>>>>>>>> people in). >>>>>>>>>>> >>>>>>>>>>> In terms of next steps of areas which I think would be good to >>>>>>>>>>> focus on for establishing consensus: >>>>>>>>>>> >>>>>>>>>>> 1. How do we model the manifest entry structure so that changes >>>>>>>>>>> to manifest DVs can be obtained easily from the root? There are a >>>>>>>>>>> few >>>>>>>>>>> options here; the most promising approach is to keep an additional >>>>>>>>>>> DV which >>>>>>>>>>> encodes the diff in additional positions which have been removed >>>>>>>>>>> from a >>>>>>>>>>> leaf manifest. >>>>>>>>>>> >>>>>>>>>>> 2. Modeling partition transforms via expressions and >>>>>>>>>>> establishing a unified table ID space so that we can simplify how >>>>>>>>>>> partition >>>>>>>>>>> tuples may be represented via stats and also have a way in the >>>>>>>>>>> future to >>>>>>>>>>> store stats on any derived column. I have a short proposal >>>>>>>>>>> <https://docs.google.com/document/d/1oV8dapKVzB4pZy5pKHUCj5j9i2_1p37BJSeT7hyKPpg/edit?tab=t.0> >>>>>>>>>>> for >>>>>>>>>>> this that probably still needs some tightening up on the expression >>>>>>>>>>> modeling itself (and some prototyping) but the general idea for >>>>>>>>>>> establishing a unified table ID space is covered. All feedback >>>>>>>>>>> welcome! >>>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>> >>>>>>>>>>> Amogh Jahagirdar >>>>>>>>>>> >>>>>>>>>>> On Mon, Aug 25, 2025 at 1:34 PM Kevin Liu <[email protected]> >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>>> Thanks Amogh. Looks like the recording for last week's sync is >>>>>>>>>>>> available on Youtube. Here's the link, >>>>>>>>>>>> https://www.youtube.com/watch?v=uWm-p--8oVQ >>>>>>>>>>>> >>>>>>>>>>>> Best, >>>>>>>>>>>> Kevin Liu >>>>>>>>>>>> >>>>>>>>>>>> On Tue, Aug 12, 2025 at 9:10 PM Amogh Jahagirdar < >>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Hey folks, >>>>>>>>>>>>> >>>>>>>>>>>>> Just following up on this to give the community as to where >>>>>>>>>>>>> we're at and my proposed next steps. >>>>>>>>>>>>> >>>>>>>>>>>>> I've been editing and merging the contents from our proposal >>>>>>>>>>>>> into the proposal >>>>>>>>>>>>> <https://docs.google.com/document/d/1k4x8utgh41Sn1tr98eynDKCWq035SV_f75rtNHcerVw/edit?tab=t.0#heading=h.unn922df0zzw> >>>>>>>>>>>>> from >>>>>>>>>>>>> Russell and others. For any future comments on docs, please >>>>>>>>>>>>> comment on the >>>>>>>>>>>>> linked proposal. I've also marked it on our doc in red text so >>>>>>>>>>>>> it's clear >>>>>>>>>>>>> to redirect to the other proposal as a source of truth for >>>>>>>>>>>>> comments. >>>>>>>>>>>>> >>>>>>>>>>>>> In terms of next steps, >>>>>>>>>>>>> >>>>>>>>>>>>> 1. An important design decision point is around inline >>>>>>>>>>>>> manifest DVs, external manifest DVs or enabling both. I'm working >>>>>>>>>>>>> on >>>>>>>>>>>>> measuring different approaches for representing the compressed DV >>>>>>>>>>>>> representation since that will inform how many entries can >>>>>>>>>>>>> reasonably fit >>>>>>>>>>>>> in a small root manifest; from that we can derive implications on >>>>>>>>>>>>> different >>>>>>>>>>>>> write patterns and determine the right approach for storing these >>>>>>>>>>>>> manifest >>>>>>>>>>>>> DVs. >>>>>>>>>>>>> >>>>>>>>>>>>> 2. Another key point is around determining if/how we can >>>>>>>>>>>>> reasonably enable V4 to represent changes in the root manifest so >>>>>>>>>>>>> that >>>>>>>>>>>>> readers can effectively just infer file level changes from the >>>>>>>>>>>>> root. >>>>>>>>>>>>> >>>>>>>>>>>>> 3. One of the aspects of the proposal is getting away from >>>>>>>>>>>>> partition tuple requirement in the root which currently holds us >>>>>>>>>>>>> to have >>>>>>>>>>>>> associativity between a partition spec and a manifest. These >>>>>>>>>>>>> aspects can be >>>>>>>>>>>>> modeled as essentially column stats which gives a lot of >>>>>>>>>>>>> flexibility into >>>>>>>>>>>>> the organization of the manifest. There are important details >>>>>>>>>>>>> around field >>>>>>>>>>>>> ID spaces here which tie into how the stats are structured. What >>>>>>>>>>>>> we're >>>>>>>>>>>>> proposing here is to have a unified expression ID space that >>>>>>>>>>>>> could also >>>>>>>>>>>>> benefit us for storing things like virtual columns down the line. >>>>>>>>>>>>> I go into >>>>>>>>>>>>> this in the proposal but I'm working on separating the >>>>>>>>>>>>> appropriate parts so >>>>>>>>>>>>> that the original proposal can mostly just focus on the >>>>>>>>>>>>> organization of the >>>>>>>>>>>>> content metadata tree and not how we want to solve this >>>>>>>>>>>>> particular ID space >>>>>>>>>>>>> problem. >>>>>>>>>>>>> >>>>>>>>>>>>> 4. I'm planning on scheduling a recurring community sync >>>>>>>>>>>>> starting next Tuesday at 9am PST, every 2 weeks. If I get >>>>>>>>>>>>> feedback from >>>>>>>>>>>>> folks that this time will never work, I can certainly adjust. For >>>>>>>>>>>>> some >>>>>>>>>>>>> reason, I don't have the ability to add to the Iceberg Dev >>>>>>>>>>>>> calendar, so >>>>>>>>>>>>> I'll figure that out and update the thread when the event is >>>>>>>>>>>>> scheduled. >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks, >>>>>>>>>>>>> >>>>>>>>>>>>> Amogh Jahagirdar >>>>>>>>>>>>> >>>>>>>>>>>>> On Tue, Jul 22, 2025 at 11:47 AM Russell Spitzer < >>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> I think this is a great way forward, starting out with this >>>>>>>>>>>>>> much parallel development shows that we have a lot of consensus >>>>>>>>>>>>>> already :) >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Tue, Jul 22, 2025 at 12:42 PM Amogh Jahagirdar < >>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Hey folks, just following up on this. It looks like our >>>>>>>>>>>>>>> proposal and the proposal that @Russell Spitzer >>>>>>>>>>>>>>> <[email protected]> shared are pretty aligned. I >>>>>>>>>>>>>>> was just chatting with Russell about this, and we think it'd be >>>>>>>>>>>>>>> best to >>>>>>>>>>>>>>> combine both proposals and have a singular large effort on >>>>>>>>>>>>>>> this. I can also >>>>>>>>>>>>>>> set up a focused community discussion (similar to what we're >>>>>>>>>>>>>>> doing on the >>>>>>>>>>>>>>> other V4 proposals) on this starting sometime next week just to >>>>>>>>>>>>>>> get things >>>>>>>>>>>>>>> moving, if that works for people. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Amogh Jahagirdar >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Mon, Jul 14, 2025 at 9:48 PM Amogh Jahagirdar < >>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Hey Russell, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Thanks for sharing the proposal! A few of us (Ryan, Dan, >>>>>>>>>>>>>>>> Anoop and I) have also been working on a proposal for an >>>>>>>>>>>>>>>> adaptive metadata >>>>>>>>>>>>>>>> tree structure as part of enabling more efficient one file >>>>>>>>>>>>>>>> commits. From a >>>>>>>>>>>>>>>> read of the summary, it's great to see that we're thinking >>>>>>>>>>>>>>>> along the same >>>>>>>>>>>>>>>> lines about how to tackle this fundamental area! >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Here is our proposal: >>>>>>>>>>>>>>>> https://docs.google.com/document/d/1q2asTpq471pltOTC6AsTLQIQcgEsh0AvEhRWnCcvZn0 >>>>>>>>>>>>>>>> <https://docs.google.com/document/d/1q2asTpq471pltOTC6AsTLQIQcgEsh0AvEhRWnCcvZn0> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>> Amogh Jahagirdar >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Mon, Jul 14, 2025 at 8:08 PM Russell Spitzer < >>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Hey y'all! >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> We (Yi Fang, Steven Wu and Myself) wanted to share some >>>>>>>>>>>>>>>>> of the thoughts we had on how one-file commits could work >>>>>>>>>>>>>>>>> in Iceberg. This is pretty >>>>>>>>>>>>>>>>> much just a high level overview of the concepts we think >>>>>>>>>>>>>>>>> we need and how Iceberg would behave. >>>>>>>>>>>>>>>>> We haven't gone very far into the actual implementation >>>>>>>>>>>>>>>>> and changes that would need to occur in the >>>>>>>>>>>>>>>>> SDK to make this happen. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> The high level summary is: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Manifest Lists are out >>>>>>>>>>>>>>>>> Root Manifests take their place >>>>>>>>>>>>>>>>> A Root manifest can have data manifests, delete >>>>>>>>>>>>>>>>> manifests, manifest delete vectors, data delete vectors and >>>>>>>>>>>>>>>>> data files >>>>>>>>>>>>>>>>> Manifest delete vectors allow for modifying a manifest >>>>>>>>>>>>>>>>> without deleting it entirely >>>>>>>>>>>>>>>>> Data files let you append without writing an >>>>>>>>>>>>>>>>> intermediary manifest >>>>>>>>>>>>>>>>> Having child data and delete manifests lets you still >>>>>>>>>>>>>>>>> scale >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Please take a look if you like, >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> https://docs.google.com/document/d/1k4x8utgh41Sn1tr98eynDKCWq035SV_f75rtNHcerVw/edit?tab=t.0 >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> I'm excited to see what other proposals and Ideas are >>>>>>>>>>>>>>>>> floating around the community, >>>>>>>>>>>>>>>>> Russ >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On Wed, Jul 2, 2025 at 6:29 PM John Zhuge < >>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Very excited about the idea! >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> On Wed, Jul 2, 2025 at 1:17 PM Anoop Johnson < >>>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> I'm very interested in this initiative. Micah Kornfield >>>>>>>>>>>>>>>>>>> and I presented >>>>>>>>>>>>>>>>>>> <https://youtu.be/4d4nqKkANdM?si=9TXgaUIXbq-l8idi&t=1405> >>>>>>>>>>>>>>>>>>> on high-throughput ingestion for Iceberg tables at the 2024 >>>>>>>>>>>>>>>>>>> Iceberg Summit, >>>>>>>>>>>>>>>>>>> which leveraged Google infrastructure like Colossus for >>>>>>>>>>>>>>>>>>> efficient appends. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> This new proposal is particularly exciting because it >>>>>>>>>>>>>>>>>>> offers significant advancements in commit latency and >>>>>>>>>>>>>>>>>>> metadata storage >>>>>>>>>>>>>>>>>>> footprint. Furthermore, a consistent manifest structure >>>>>>>>>>>>>>>>>>> promises to >>>>>>>>>>>>>>>>>>> simplify the design and codebase, which is a major benefit. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> A related idea I've been exploring is having a loose >>>>>>>>>>>>>>>>>>> affinity between data and delete manifests. While the >>>>>>>>>>>>>>>>>>> current separation of >>>>>>>>>>>>>>>>>>> data and delete manifests in Iceberg is valuable for >>>>>>>>>>>>>>>>>>> avoiding data file >>>>>>>>>>>>>>>>>>> rewrites (and stats updates) when deletes change, it does >>>>>>>>>>>>>>>>>>> necessitate a >>>>>>>>>>>>>>>>>>> join operation during reads. I'd be keen to discuss >>>>>>>>>>>>>>>>>>> approaches that could >>>>>>>>>>>>>>>>>>> potentially reduce this read-side cost while retaining the >>>>>>>>>>>>>>>>>>> benefits of >>>>>>>>>>>>>>>>>>> separate manifests. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Best, >>>>>>>>>>>>>>>>>>> Anoop >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> On Fri, Jun 13, 2025 at 11:06 AM Jagdeep Sidhu < >>>>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Hi everyone, >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> I am new to the Iceberg community but would love to >>>>>>>>>>>>>>>>>>>> participate in these discussions to reduce the number of >>>>>>>>>>>>>>>>>>>> file writes, >>>>>>>>>>>>>>>>>>>> especially for small writes/commits. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Thank you! >>>>>>>>>>>>>>>>>>>> -Jagdeep >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> On Thu, Jun 5, 2025 at 4:02 PM Anurag Mantripragada >>>>>>>>>>>>>>>>>>>> <[email protected]> wrote: >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> We have been hitting all the metadata problems you >>>>>>>>>>>>>>>>>>>>> mentioned, Ryan. I’m on-board to help however I can to >>>>>>>>>>>>>>>>>>>>> improve this area. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> ~ Anurag Mantripragada >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> On Jun 3, 2025, at 2:22 AM, Huang-Hsiang Cheng >>>>>>>>>>>>>>>>>>>>> <[email protected]> wrote: >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> I am interested in this idea and looking forward to >>>>>>>>>>>>>>>>>>>>> collaboration. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>>>> Huang-Hsiang >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> On Jun 2, 2025, at 10:14 AM, namratha mk < >>>>>>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Hello, >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> I am interested in contributing to this effort. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>>>> Namratha >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> On Thu, May 29, 2025 at 1:36 PM Amogh Jahagirdar < >>>>>>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Thanks for kicking this thread off Ryan, I'm >>>>>>>>>>>>>>>>>>>>>> interested in helping out here! I've been working on a >>>>>>>>>>>>>>>>>>>>>> proposal in this >>>>>>>>>>>>>>>>>>>>>> area and it would be great to collaborate with different >>>>>>>>>>>>>>>>>>>>>> folks and exchange >>>>>>>>>>>>>>>>>>>>>> ideas here, since I think a lot of people are interested >>>>>>>>>>>>>>>>>>>>>> in solving this >>>>>>>>>>>>>>>>>>>>>> problem. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>>>>> Amogh Jahagirdar >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> On Thu, May 29, 2025 at 2:25 PM Ryan Blue < >>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> Hi everyone, >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> Like Russell’s recent note, I’m starting a thread to >>>>>>>>>>>>>>>>>>>>>>> connect those of us that are interested in the idea of >>>>>>>>>>>>>>>>>>>>>>> changing Iceberg’s >>>>>>>>>>>>>>>>>>>>>>> metadata in v4 so that in most cases committing a >>>>>>>>>>>>>>>>>>>>>>> change only requires >>>>>>>>>>>>>>>>>>>>>>> writing one additional metadata file. >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> *Idea: One-file commits* >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> The current Iceberg metadata structure requires >>>>>>>>>>>>>>>>>>>>>>> writing at least one manifest and a new manifest list >>>>>>>>>>>>>>>>>>>>>>> to produce a new >>>>>>>>>>>>>>>>>>>>>>> snapshot. The goal of this work is to allow more >>>>>>>>>>>>>>>>>>>>>>> flexibility by allowing >>>>>>>>>>>>>>>>>>>>>>> the manifest list layer to store data and delete files. >>>>>>>>>>>>>>>>>>>>>>> As a result, only >>>>>>>>>>>>>>>>>>>>>>> one file write would be needed before committing the >>>>>>>>>>>>>>>>>>>>>>> new snapshot. In >>>>>>>>>>>>>>>>>>>>>>> addition, this work will also try to explore: >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> - Avoiding small manifests that must be read in >>>>>>>>>>>>>>>>>>>>>>> parallel and later compacted (metadata maintenance >>>>>>>>>>>>>>>>>>>>>>> changes) >>>>>>>>>>>>>>>>>>>>>>> - Extend metadata skipping to use aggregated >>>>>>>>>>>>>>>>>>>>>>> column ranges that are compatible with geospatial >>>>>>>>>>>>>>>>>>>>>>> data (manifest metadata) >>>>>>>>>>>>>>>>>>>>>>> - Using soft deletes to avoid rewriting existing >>>>>>>>>>>>>>>>>>>>>>> manifests (metadata DVs) >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> If you’re interested in these problems, please reply! >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> Ryan >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>>> John Zhuge >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>
