Re: [DISCUSS] v4 - One file commits

Amogh Jahagirdar Mon, 08 Sep 2025 07:58:24 -0700

Hey folks sorry for the late follow up here,

Thanks @Kevin Liu <[email protected]> for sharing the recording link
of the previous discussion! I've set up another sync for next Tuesday 09/16
at 9am PST. This time I've set it up from my corporate email so we can get
recordings and transcriptions (and I've made sure to keep the meeting
invite open so we don't have to manually let people in).


In terms of next steps of areas which I think would be good to focus on for
establishing consensus:

1. How do we model the manifest entry structure so that changes to manifest
DVs can be obtained easily from the root? There are a few options here; the
most promising approach is to keep an additional DV which encodes the diff
in additional positions which have been removed from a leaf manifest.

2. Modeling partition transforms via expressions and establishing a unified
table ID space so that we can simplify how partition tuples may be
represented via stats and also have a way in the future to store stats on
any derived column. I have a short proposal
<https://docs.google.com/document/d/1oV8dapKVzB4pZy5pKHUCj5j9i2_1p37BJSeT7hyKPpg/edit?tab=t.0>
for
this that probably still needs some tightening up on the expression
modeling itself (and some prototyping) but the general idea for
establishing a unified table ID space is covered. All feedback welcome!

Thanks,

Amogh Jahagirdar

On Mon, Aug 25, 2025 at 1:34 PM Kevin Liu <[email protected]> wrote:

> Thanks Amogh. Looks like the recording for last week's sync is available
> on Youtube. Here's the link, https://www.youtube.com/watch?v=uWm-p--8oVQ
>
> Best,
> Kevin Liu
>
> On Tue, Aug 12, 2025 at 9:10 PM Amogh Jahagirdar <[email protected]> wrote:
>
>> Hey folks,
>>
>> Just following up on this to give the community as to where we're at and
>> my proposed next steps.
>>
>> I've been editing and merging the contents from our proposal into the
>> proposal
>> <https://docs.google.com/document/d/1k4x8utgh41Sn1tr98eynDKCWq035SV_f75rtNHcerVw/edit?tab=t.0#heading=h.unn922df0zzw>
>>  from
>> Russell and others. For any future comments on docs, please comment on the
>> linked proposal. I've also marked it on our doc in red text so it's clear
>> to redirect to the other proposal as a source of truth for comments.
>>
>> In terms of next steps,
>>
>> 1. An important design decision point is around inline manifest DVs,
>> external manifest DVs or enabling both. I'm working on measuring different
>> approaches for representing the compressed DV representation since that
>> will inform how many entries can reasonably fit in a small root manifest;
>> from that we can derive implications on different write patterns and
>> determine the right approach for storing these manifest DVs.
>>
>> 2. Another key point is around determining if/how we can reasonably
>> enable V4 to represent changes in the root manifest so that readers can
>> effectively just infer file level changes from the root.
>>
>> 3. One of the aspects of the proposal is getting away from partition
>> tuple requirement in the root which currently holds us to have
>> associativity between a partition spec and a manifest. These aspects can be
>> modeled as essentially column stats which gives a lot of flexibility into
>> the organization of the manifest. There are important details around field
>> ID spaces here which tie into how the stats are structured. What we're
>> proposing here is to have a unified expression ID space that could also
>> benefit us for storing things like virtual columns down the line. I go into
>> this in the proposal but I'm working on separating the appropriate parts so
>> that the original proposal can mostly just focus on the organization of the
>> content metadata tree and not how we want to solve this particular ID space
>> problem.
>>
>> 4. I'm planning on scheduling a recurring community sync starting next
>> Tuesday at 9am PST, every 2 weeks. If I get feedback from folks that this
>> time will never work, I can certainly adjust. For some reason, I don't have
>> the ability to add to the Iceberg Dev calendar, so I'll figure that out and
>> update the thread when the event is scheduled.
>>
>> Thanks,
>>
>> Amogh Jahagirdar
>>
>> On Tue, Jul 22, 2025 at 11:47 AM Russell Spitzer <
>> [email protected]> wrote:
>>
>>> I think this is a great way forward, starting out with this much
>>> parallel development shows that we have a lot of consensus already :)
>>>
>>> On Tue, Jul 22, 2025 at 12:42 PM Amogh Jahagirdar <[email protected]>
>>> wrote:
>>>
>>>> Hey folks, just following up on this. It looks like our proposal and
>>>> the proposal that @Russell Spitzer <[email protected]> shared
>>>> are pretty aligned. I was just chatting with Russell about this, and we
>>>> think it'd be best to combine both proposals and have a singular large
>>>> effort on this. I can also set up a focused community discussion (similar
>>>> to what we're doing on the other V4 proposals) on this starting sometime
>>>> next week just to get things moving, if that works for people.
>>>>
>>>> Thanks,
>>>>
>>>> Amogh Jahagirdar
>>>>
>>>> On Mon, Jul 14, 2025 at 9:48 PM Amogh Jahagirdar <[email protected]>
>>>> wrote:
>>>>
>>>>> Hey Russell,
>>>>>
>>>>> Thanks for sharing the proposal! A few of us (Ryan, Dan, Anoop and I)
>>>>> have also been working on a proposal for an adaptive metadata tree
>>>>> structure as part of enabling more efficient one file commits. From a read
>>>>> of the summary, it's great to see that we're thinking along the same lines
>>>>> about how to tackle this fundamental area!
>>>>>
>>>>> Here is our proposal:
>>>>> https://docs.google.com/document/d/1q2asTpq471pltOTC6AsTLQIQcgEsh0AvEhRWnCcvZn0
>>>>> <https://docs.google.com/document/d/1q2asTpq471pltOTC6AsTLQIQcgEsh0AvEhRWnCcvZn0>
>>>>>
>>>>> Thanks,
>>>>> Amogh Jahagirdar
>>>>>
>>>>> On Mon, Jul 14, 2025 at 8:08 PM Russell Spitzer <
>>>>> [email protected]> wrote:
>>>>>
>>>>>> Hey y'all!
>>>>>>
>>>>>> We (Yi Fang, Steven Wu and Myself) wanted to share some
>>>>>> of the thoughts we had on how one-file commits could work in Iceberg.
>>>>>> This is pretty
>>>>>> much just a high level overview of the concepts we think we need and
>>>>>> how Iceberg would behave.
>>>>>> We haven't gone very far into the actual implementation and changes
>>>>>> that would need to occur in the
>>>>>> SDK to make this happen.
>>>>>>
>>>>>> The high level summary is:
>>>>>>
>>>>>> Manifest Lists are out
>>>>>> Root Manifests take their place
>>>>>>   A Root manifest can have data manifests, delete manifests, manifest
>>>>>> delete vectors, data delete vectors and data files
>>>>>>   Manifest delete vectors allow for modifying a manifest without
>>>>>> deleting it entirely
>>>>>>   Data files let you append without writing an intermediary manifest
>>>>>>   Having child data and delete manifests lets you still scale
>>>>>>
>>>>>> Please take a look if you like,
>>>>>>
>>>>>> https://docs.google.com/document/d/1k4x8utgh41Sn1tr98eynDKCWq035SV_f75rtNHcerVw/edit?tab=t.0
>>>>>>
>>>>>> I'm excited to see what other proposals and Ideas are floating around
>>>>>> the community,
>>>>>> Russ
>>>>>>
>>>>>> On Wed, Jul 2, 2025 at 6:29 PM John Zhuge <[email protected]> wrote:
>>>>>>
>>>>>>> Very excited about the idea!
>>>>>>>
>>>>>>> On Wed, Jul 2, 2025 at 1:17 PM Anoop Johnson <
>>>>>>> [email protected]> wrote:
>>>>>>>
>>>>>>>> I'm very interested in this initiative. Micah Kornfield and I
>>>>>>>> presented <https://youtu.be/4d4nqKkANdM?si=9TXgaUIXbq-l8idi&t=1405>
>>>>>>>> on high-throughput ingestion for Iceberg tables at the 2024 Iceberg 
>>>>>>>> Summit,
>>>>>>>> which leveraged Google infrastructure like Colossus for efficient 
>>>>>>>> appends.
>>>>>>>>
>>>>>>>> This new proposal is particularly exciting because it offers
>>>>>>>> significant advancements in commit latency and metadata storage 
>>>>>>>> footprint.
>>>>>>>> Furthermore, a consistent manifest structure promises to simplify the
>>>>>>>> design and codebase, which is a major benefit.
>>>>>>>>
>>>>>>>> A related idea I've been exploring is having a loose affinity
>>>>>>>> between data and delete manifests. While the current separation of 
>>>>>>>> data and
>>>>>>>> delete manifests in Iceberg is valuable for avoiding data file rewrites
>>>>>>>> (and stats updates) when deletes change, it does necessitate a join
>>>>>>>> operation during reads. I'd be keen to discuss approaches that could
>>>>>>>> potentially reduce this read-side cost while retaining the benefits of
>>>>>>>> separate manifests.
>>>>>>>>
>>>>>>>> Best,
>>>>>>>> Anoop
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Fri, Jun 13, 2025 at 11:06 AM Jagdeep Sidhu <
>>>>>>>> [email protected]> wrote:
>>>>>>>>
>>>>>>>>> Hi everyone,
>>>>>>>>>
>>>>>>>>> I am new to the Iceberg community but would love to participate in
>>>>>>>>> these discussions to reduce the number of file writes, especially for 
>>>>>>>>> small
>>>>>>>>> writes/commits.
>>>>>>>>>
>>>>>>>>> Thank you!
>>>>>>>>> -Jagdeep
>>>>>>>>>
>>>>>>>>> On Thu, Jun 5, 2025 at 4:02 PM Anurag Mantripragada
>>>>>>>>> <[email protected]> wrote:
>>>>>>>>>
>>>>>>>>>> We have been hitting all the metadata problems you mentioned,
>>>>>>>>>> Ryan. I’m on-board to help however I can to improve this area.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> ~ Anurag Mantripragada
>>>>>>>>>>
>>>>>>>>>> On Jun 3, 2025, at 2:22 AM, Huang-Hsiang Cheng
>>>>>>>>>> <[email protected]> wrote:
>>>>>>>>>>
>>>>>>>>>> I am interested in this idea and looking forward to collaboration.
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> Huang-Hsiang
>>>>>>>>>>
>>>>>>>>>> On Jun 2, 2025, at 10:14 AM, namratha mk <[email protected]>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>> Hello,
>>>>>>>>>>
>>>>>>>>>> I am interested in contributing to this effort.
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> Namratha
>>>>>>>>>>
>>>>>>>>>> On Thu, May 29, 2025 at 1:36 PM Amogh Jahagirdar <
>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>
>>>>>>>>>>> Thanks for kicking this thread off Ryan, I'm interested in
>>>>>>>>>>> helping out here! I've been working on a proposal in this area and 
>>>>>>>>>>> it would
>>>>>>>>>>> be great to collaborate with different folks and exchange ideas 
>>>>>>>>>>> here, since
>>>>>>>>>>> I think a lot of people are interested in solving this problem.
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> Amogh Jahagirdar
>>>>>>>>>>>
>>>>>>>>>>> On Thu, May 29, 2025 at 2:25 PM Ryan Blue <[email protected]>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hi everyone,
>>>>>>>>>>>>
>>>>>>>>>>>> Like Russell’s recent note, I’m starting a thread to connect
>>>>>>>>>>>> those of us that are interested in the idea of changing Iceberg’s 
>>>>>>>>>>>> metadata
>>>>>>>>>>>> in v4 so that in most cases committing a change only requires 
>>>>>>>>>>>> writing one
>>>>>>>>>>>> additional metadata file.
>>>>>>>>>>>>
>>>>>>>>>>>> *Idea: One-file commits*
>>>>>>>>>>>>
>>>>>>>>>>>> The current Iceberg metadata structure requires writing at
>>>>>>>>>>>> least one manifest and a new manifest list to produce a new 
>>>>>>>>>>>> snapshot. The
>>>>>>>>>>>> goal of this work is to allow more flexibility by allowing the 
>>>>>>>>>>>> manifest
>>>>>>>>>>>> list layer to store data and delete files. As a result, only one 
>>>>>>>>>>>> file write
>>>>>>>>>>>> would be needed before committing the new snapshot. In addition, 
>>>>>>>>>>>> this work
>>>>>>>>>>>> will also try to explore:
>>>>>>>>>>>>
>>>>>>>>>>>>    - Avoiding small manifests that must be read in parallel
>>>>>>>>>>>>    and later compacted (metadata maintenance changes)
>>>>>>>>>>>>    - Extend metadata skipping to use aggregated column ranges
>>>>>>>>>>>>    that are compatible with geospatial data (manifest metadata)
>>>>>>>>>>>>    - Using soft deletes to avoid rewriting existing manifests
>>>>>>>>>>>>    (metadata DVs)
>>>>>>>>>>>>
>>>>>>>>>>>> If you’re interested in these problems, please reply!
>>>>>>>>>>>>
>>>>>>>>>>>> Ryan
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> John Zhuge
>>>>>>>
>>>>>>

Re: [DISCUSS] v4 - One file commits

Reply via email to