Hi everyone, I am new to the Iceberg community but would love to participate in these discussions to reduce the number of file writes, especially for small writes/commits.
Thank you! -Jagdeep On Thu, Jun 5, 2025 at 4:02 PM Anurag Mantripragada <amantriprag...@apple.com.invalid> wrote: > We have been hitting all the metadata problems you mentioned, Ryan. I’m > on-board to help however I can to improve this area. > > > ~ Anurag Mantripragada > > On Jun 3, 2025, at 2:22 AM, Huang-Hsiang Cheng <hua...@apple.com.INVALID> > wrote: > > I am interested in this idea and looking forward to collaboration. > > Thanks, > Huang-Hsiang > > On Jun 2, 2025, at 10:14 AM, namratha mk <nmk...@gmail.com> wrote: > > Hello, > > I am interested in contributing to this effort. > > Thanks, > Namratha > > On Thu, May 29, 2025 at 1:36 PM Amogh Jahagirdar <2am...@gmail.com> wrote: > >> Thanks for kicking this thread off Ryan, I'm interested in helping out >> here! I've been working on a proposal in this area and it would be great to >> collaborate with different folks and exchange ideas here, since I think a >> lot of people are interested in solving this problem. >> >> Thanks, >> Amogh Jahagirdar >> >> On Thu, May 29, 2025 at 2:25 PM Ryan Blue <rdb...@gmail.com> wrote: >> >>> Hi everyone, >>> >>> Like Russell’s recent note, I’m starting a thread to connect those of us >>> that are interested in the idea of changing Iceberg’s metadata in v4 so >>> that in most cases committing a change only requires writing one additional >>> metadata file. >>> >>> *Idea: One-file commits* >>> >>> The current Iceberg metadata structure requires writing at least one >>> manifest and a new manifest list to produce a new snapshot. The goal of >>> this work is to allow more flexibility by allowing the manifest list layer >>> to store data and delete files. As a result, only one file write would be >>> needed before committing the new snapshot. In addition, this work will also >>> try to explore: >>> >>> - Avoiding small manifests that must be read in parallel and later >>> compacted (metadata maintenance changes) >>> - Extend metadata skipping to use aggregated column ranges that are >>> compatible with geospatial data (manifest metadata) >>> - Using soft deletes to avoid rewriting existing manifests (metadata >>> DVs) >>> >>> If you’re interested in these problems, please reply! >>> >>> Ryan >>> >> > >