Re: [DISCUSS] v4 - One file commits

Amogh Jahagirdar Tue, 12 Aug 2025 21:11:43 -0700

Hey folks,

Just following up on this to give the community as to where we're at and my
proposed next steps.


I've been editing and merging the contents from our proposal into the
proposal
<https://docs.google.com/document/d/1k4x8utgh41Sn1tr98eynDKCWq035SV_f75rtNHcerVw/edit?tab=t.0#heading=h.unn922df0zzw>
from
Russell and others. For any future comments on docs, please comment on the
linked proposal. I've also marked it on our doc in red text so it's clear
to redirect to the other proposal as a source of truth for comments.

In terms of next steps,

1. An important design decision point is around inline manifest DVs,
external manifest DVs or enabling both. I'm working on measuring different
approaches for representing the compressed DV representation since that
will inform how many entries can reasonably fit in a small root manifest;
from that we can derive implications on different write patterns and
determine the right approach for storing these manifest DVs.

2. Another key point is around determining if/how we can reasonably enable
V4 to represent changes in the root manifest so that readers can
effectively just infer file level changes from the root.

3. One of the aspects of the proposal is getting away from partition tuple
requirement in the root which currently holds us to have associativity
between a partition spec and a manifest. These aspects can be modeled as
essentially column stats which gives a lot of flexibility into the
organization of the manifest. There are important details around field ID
spaces here which tie into how the stats are structured. What we're
proposing here is to have a unified expression ID space that could also
benefit us for storing things like virtual columns down the line. I go into
this in the proposal but I'm working on separating the appropriate parts so
that the original proposal can mostly just focus on the organization of the
content metadata tree and not how we want to solve this particular ID space
problem.

4. I'm planning on scheduling a recurring community sync starting next
Tuesday at 9am PST, every 2 weeks. If I get feedback from folks that this
time will never work, I can certainly adjust. For some reason, I don't have
the ability to add to the Iceberg Dev calendar, so I'll figure that out and
update the thread when the event is scheduled.

Thanks,

Amogh Jahagirdar

On Tue, Jul 22, 2025 at 11:47 AM Russell Spitzer <[email protected]>
wrote:

> I think this is a great way forward, starting out with this much parallel
> development shows that we have a lot of consensus already :)
>
> On Tue, Jul 22, 2025 at 12:42 PM Amogh Jahagirdar <[email protected]>
> wrote:
>
>> Hey folks, just following up on this. It looks like our proposal and the
>> proposal that @Russell Spitzer <[email protected]> shared are
>> pretty aligned. I was just chatting with Russell about this, and we think
>> it'd be best to combine both proposals and have a singular large effort on
>> this. I can also set up a focused community discussion (similar to what
>> we're doing on the other V4 proposals) on this starting sometime next week
>> just to get things moving, if that works for people.
>>
>> Thanks,
>>
>> Amogh Jahagirdar
>>
>> On Mon, Jul 14, 2025 at 9:48 PM Amogh Jahagirdar <[email protected]>
>> wrote:
>>
>>> Hey Russell,
>>>
>>> Thanks for sharing the proposal! A few of us (Ryan, Dan, Anoop and I)
>>> have also been working on a proposal for an adaptive metadata tree
>>> structure as part of enabling more efficient one file commits. From a read
>>> of the summary, it's great to see that we're thinking along the same lines
>>> about how to tackle this fundamental area!
>>>
>>> Here is our proposal:
>>> https://docs.google.com/document/d/1q2asTpq471pltOTC6AsTLQIQcgEsh0AvEhRWnCcvZn0
>>> <https://docs.google.com/document/d/1q2asTpq471pltOTC6AsTLQIQcgEsh0AvEhRWnCcvZn0>
>>>
>>> Thanks,
>>> Amogh Jahagirdar
>>>
>>> On Mon, Jul 14, 2025 at 8:08 PM Russell Spitzer <
>>> [email protected]> wrote:
>>>
>>>> Hey y'all!
>>>>
>>>> We (Yi Fang, Steven Wu and Myself) wanted to share some
>>>> of the thoughts we had on how one-file commits could work in Iceberg.
>>>> This is pretty
>>>> much just a high level overview of the concepts we think we need and
>>>> how Iceberg would behave.
>>>> We haven't gone very far into the actual implementation and changes
>>>> that would need to occur in the
>>>> SDK to make this happen.
>>>>
>>>> The high level summary is:
>>>>
>>>> Manifest Lists are out
>>>> Root Manifests take their place
>>>>   A Root manifest can have data manifests, delete manifests, manifest
>>>> delete vectors, data delete vectors and data files
>>>>   Manifest delete vectors allow for modifying a manifest without
>>>> deleting it entirely
>>>>   Data files let you append without writing an intermediary manifest
>>>>   Having child data and delete manifests lets you still scale
>>>>
>>>> Please take a look if you like,
>>>>
>>>> https://docs.google.com/document/d/1k4x8utgh41Sn1tr98eynDKCWq035SV_f75rtNHcerVw/edit?tab=t.0
>>>>
>>>> I'm excited to see what other proposals and Ideas are floating around
>>>> the community,
>>>> Russ
>>>>
>>>> On Wed, Jul 2, 2025 at 6:29 PM John Zhuge <[email protected]> wrote:
>>>>
>>>>> Very excited about the idea!
>>>>>
>>>>> On Wed, Jul 2, 2025 at 1:17 PM Anoop Johnson <
>>>>> [email protected]> wrote:
>>>>>
>>>>>> I'm very interested in this initiative. Micah Kornfield and I
>>>>>> presented <https://youtu.be/4d4nqKkANdM?si=9TXgaUIXbq-l8idi&t=1405>
>>>>>> on high-throughput ingestion for Iceberg tables at the 2024 Iceberg 
>>>>>> Summit,
>>>>>> which leveraged Google infrastructure like Colossus for efficient 
>>>>>> appends.
>>>>>>
>>>>>> This new proposal is particularly exciting because it offers
>>>>>> significant advancements in commit latency and metadata storage 
>>>>>> footprint.
>>>>>> Furthermore, a consistent manifest structure promises to simplify the
>>>>>> design and codebase, which is a major benefit.
>>>>>>
>>>>>> A related idea I've been exploring is having a loose affinity between
>>>>>> data and delete manifests. While the current separation of data and 
>>>>>> delete
>>>>>> manifests in Iceberg is valuable for avoiding data file rewrites (and 
>>>>>> stats
>>>>>> updates) when deletes change, it does necessitate a join operation during
>>>>>> reads. I'd be keen to discuss approaches that could potentially reduce 
>>>>>> this
>>>>>> read-side cost while retaining the benefits of separate manifests.
>>>>>>
>>>>>> Best,
>>>>>> Anoop
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Fri, Jun 13, 2025 at 11:06 AM Jagdeep Sidhu <
>>>>>> [email protected]> wrote:
>>>>>>
>>>>>>> Hi everyone,
>>>>>>>
>>>>>>> I am new to the Iceberg community but would love to participate in
>>>>>>> these discussions to reduce the number of file writes, especially for 
>>>>>>> small
>>>>>>> writes/commits.
>>>>>>>
>>>>>>> Thank you!
>>>>>>> -Jagdeep
>>>>>>>
>>>>>>> On Thu, Jun 5, 2025 at 4:02 PM Anurag Mantripragada
>>>>>>> <[email protected]> wrote:
>>>>>>>
>>>>>>>> We have been hitting all the metadata problems you mentioned, Ryan.
>>>>>>>> I’m on-board to help however I can to improve this area.
>>>>>>>>
>>>>>>>>
>>>>>>>> ~ Anurag Mantripragada
>>>>>>>>
>>>>>>>> On Jun 3, 2025, at 2:22 AM, Huang-Hsiang Cheng
>>>>>>>> <[email protected]> wrote:
>>>>>>>>
>>>>>>>> I am interested in this idea and looking forward to collaboration.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Huang-Hsiang
>>>>>>>>
>>>>>>>> On Jun 2, 2025, at 10:14 AM, namratha mk <[email protected]> wrote:
>>>>>>>>
>>>>>>>> Hello,
>>>>>>>>
>>>>>>>> I am interested in contributing to this effort.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Namratha
>>>>>>>>
>>>>>>>> On Thu, May 29, 2025 at 1:36 PM Amogh Jahagirdar <[email protected]>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Thanks for kicking this thread off Ryan, I'm interested in helping
>>>>>>>>> out here! I've been working on a proposal in this area and it would be
>>>>>>>>> great to collaborate with different folks and exchange ideas here, 
>>>>>>>>> since I
>>>>>>>>> think a lot of people are interested in solving this problem.
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Amogh Jahagirdar
>>>>>>>>>
>>>>>>>>> On Thu, May 29, 2025 at 2:25 PM Ryan Blue <[email protected]>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Hi everyone,
>>>>>>>>>>
>>>>>>>>>> Like Russell’s recent note, I’m starting a thread to connect
>>>>>>>>>> those of us that are interested in the idea of changing Iceberg’s 
>>>>>>>>>> metadata
>>>>>>>>>> in v4 so that in most cases committing a change only requires 
>>>>>>>>>> writing one
>>>>>>>>>> additional metadata file.
>>>>>>>>>>
>>>>>>>>>> *Idea: One-file commits*
>>>>>>>>>>
>>>>>>>>>> The current Iceberg metadata structure requires writing at least
>>>>>>>>>> one manifest and a new manifest list to produce a new snapshot. The 
>>>>>>>>>> goal of
>>>>>>>>>> this work is to allow more flexibility by allowing the manifest list 
>>>>>>>>>> layer
>>>>>>>>>> to store data and delete files. As a result, only one file write 
>>>>>>>>>> would be
>>>>>>>>>> needed before committing the new snapshot. In addition, this work 
>>>>>>>>>> will also
>>>>>>>>>> try to explore:
>>>>>>>>>>
>>>>>>>>>>    - Avoiding small manifests that must be read in parallel and
>>>>>>>>>>    later compacted (metadata maintenance changes)
>>>>>>>>>>    - Extend metadata skipping to use aggregated column ranges
>>>>>>>>>>    that are compatible with geospatial data (manifest metadata)
>>>>>>>>>>    - Using soft deletes to avoid rewriting existing manifests
>>>>>>>>>>    (metadata DVs)
>>>>>>>>>>
>>>>>>>>>> If you’re interested in these problems, please reply!
>>>>>>>>>>
>>>>>>>>>> Ryan
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>
>>>>> --
>>>>> John Zhuge
>>>>>
>>>>

Re: [DISCUSS] v4 - One file commits

Reply via email to