Hi everyone,

I am new to the Iceberg community but would love to participate in these
discussions to reduce the number of file writes, especially for small
writes/commits.

Thank you!
-Jagdeep

On Thu, Jun 5, 2025 at 4:02 PM Anurag Mantripragada
<amantriprag...@apple.com.invalid> wrote:

> We have been hitting all the metadata problems you mentioned, Ryan. I’m
> on-board to help however I can to improve this area.
>
>
> ~ Anurag Mantripragada
>
> On Jun 3, 2025, at 2:22 AM, Huang-Hsiang Cheng <hua...@apple.com.INVALID>
> wrote:
>
> I am interested in this idea and looking forward to collaboration.
>
> Thanks,
> Huang-Hsiang
>
> On Jun 2, 2025, at 10:14 AM, namratha mk <nmk...@gmail.com> wrote:
>
> Hello,
>
> I am interested in contributing to this effort.
>
> Thanks,
> Namratha
>
> On Thu, May 29, 2025 at 1:36 PM Amogh Jahagirdar <2am...@gmail.com> wrote:
>
>> Thanks for kicking this thread off Ryan, I'm interested in helping out
>> here! I've been working on a proposal in this area and it would be great to
>> collaborate with different folks and exchange ideas here, since I think a
>> lot of people are interested in solving this problem.
>>
>> Thanks,
>> Amogh Jahagirdar
>>
>> On Thu, May 29, 2025 at 2:25 PM Ryan Blue <rdb...@gmail.com> wrote:
>>
>>> Hi everyone,
>>>
>>> Like Russell’s recent note, I’m starting a thread to connect those of us
>>> that are interested in the idea of changing Iceberg’s metadata in v4 so
>>> that in most cases committing a change only requires writing one additional
>>> metadata file.
>>>
>>> *Idea: One-file commits*
>>>
>>> The current Iceberg metadata structure requires writing at least one
>>> manifest and a new manifest list to produce a new snapshot. The goal of
>>> this work is to allow more flexibility by allowing the manifest list layer
>>> to store data and delete files. As a result, only one file write would be
>>> needed before committing the new snapshot. In addition, this work will also
>>> try to explore:
>>>
>>>    - Avoiding small manifests that must be read in parallel and later
>>>    compacted (metadata maintenance changes)
>>>    - Extend metadata skipping to use aggregated column ranges that are
>>>    compatible with geospatial data (manifest metadata)
>>>    - Using soft deletes to avoid rewriting existing manifests (metadata
>>>    DVs)
>>>
>>> If you’re interested in these problems, please reply!
>>>
>>> Ryan
>>>
>>
>
>

Reply via email to