I added/accepted on the dev calendar. Looking forward to it! On Tue, Mar 24, 2026 at 5:34 PM Micah Kornfield <[email protected]> wrote:
> It seems like we might not have full alignment on this proposal, I > tentatively scheduled a sync for next Monday (added to the iceberg dev > events calendar). Please let me know if you are interested in joining and > the time doesn't work for you (we can reschedule accordingly). > > Thanks, > Micah > > On 2026/02/09 23:15:49 Micah Kornfield wrote: > > As an update I've made the proposal to add this field to the Single file > > commits doc. > > > > Please let me know if there is any additional feedback. > > > > Thanks, > > Micah > > > > On Wed, Jan 21, 2026 at 5:16 PM Micah Kornfield <[email protected]> > > wrote: > > > > > Thanks Manu, that is the right doc. > > > > > > As an update, I've incorporated feedback from the community to the > > > document: > > > > > > At a high level the changes are: > > > - Renamed the field from "tags" to "attributes" > > > - Clarified limits on attributes should only be enforced for new data. > > > Existing tags must always be carried through. > > > - Added more details on enforcing size of tags. > > > > > > Are there any objections to folding the proposal into the V4 metadata > > > proposal? Again, the reasons for doing so are mostly around ensuring > > > consistent field numbering and making the spec update easier. > > > > > > If people want further discussion on this I'd be happy to discuss at > the > > > next V4 metadata sync or create a one-off meeting. Please let me know. > > > > > > Thanks, > > > Micah > > > > > > On Mon, Jan 5, 2026 at 5:48 PM Manu Zhang <[email protected]> > wrote: > > > > > >> Happy new year Micah. Are you linking the wrong doc (Iceberg Single > File > > >> Commits) ? > > >> I think you are referring to > > >> > https://docs.google.com/document/d/16flxDXjpBiAs_cF3sjCsa7GlvSHQ0Mmm74c8yvYQlSA/edit?tab=t.0#heading=h.cnpb2lth3egz > > >> > > >> Best, > > >> Manu > > >> > > >> On Tue, Jan 6, 2026 at 2:19 AM Micah Kornfield <[email protected] > > > > >> wrote: > > >> > > >>> Happy new year everyone, I just wanted to bump this thread (most > > >>> discussion has been happening on the doc [1]) in case it was missed > over > > >>> the holidays. > > >>> > > >>> Thanks, > > >>> Micah > > >>> > > >>> [1] > > >>> > https://docs.google.com/document/d/1k4x8utgh41Sn1tr98eynDKCWq035SV_f75rtNHcerVw/edit?tab=t.0#heading=h.unn922df0zzw > > >>> > > >>> On Fri, Dec 19, 2025 at 2:14 PM Micah Kornfield < > [email protected]> > > >>> wrote: > > >>> > > >>>> Sounds good, will wait until next year. > > >>>> > > >>>> On Fri, Dec 19, 2025 at 2:13 PM Steven Wu <[email protected]> > wrote: > > >>>> > > >>>>> Micah, many people will be OOO in the next two weeks. Can we extend > > >>>>> the feedback deadline to at least 1-2 weeks after the new year? > > >>>>> > > >>>>> On Fri, Dec 19, 2025 at 8:45 AM Micah Kornfield < > [email protected]> > > >>>>> wrote: > > >>>>> > > >>>>>> > I have no problem with adding this discussion to the single file > > >>>>>> work, but I'm not sure that would speed it up? Seems like this is > a pretty > > >>>>>> independent addition to the metadata layout? > > >>>>>> > > >>>>>> Yes, it is fairly independent. The main reason I wanted to > > >>>>>> consolidate in the doc, it appears there is a bit of metadata > > >>>>>> re-arrangement and new fields. I wanted to make sure that: > > >>>>>> > > >>>>>> 1. We avoid field ID conflicts. > > >>>>>> 2. When writing up the final spec changes it is easy to manage > and > > >>>>>> not create a dependency one way or another between the two of > these. > > >>>>>> > > >>>>>> Happy to keep the implementation of the guard-rails as a separate > > >>>>>> piece of work. > > >>>>>> > > >>>>>> Cheers, > > >>>>>> Micah > > >>>>>> > > >>>>>> On Fri, Dec 19, 2025 at 7:31 AM Russell Spitzer < > > >>>>>> [email protected]> wrote: > > >>>>>> > > >>>>>>> I have no problem with adding this discussion to the single file > > >>>>>>> work, but I'm not sure that would speed it up? Seems like this > is a pretty > > >>>>>>> independent addition to the metadata layout? > > >>>>>>> > > >>>>>>> On Thu, Dec 18, 2025 at 6:28 PM Micah Kornfield < > > >>>>>>> [email protected]> wrote: > > >>>>>>> > > >>>>>>>> Thanks for the clarification, Micah! I want to explicitly call > out > > >>>>>>>>> (and double-confirm) the key principle here: all tags must be > strictly > > >>>>>>>>> optional and never required for correctness or basic > functionality. Engines > > >>>>>>>>> should always be able to safely drop or ignore tags without > breaking reads > > >>>>>>>>> or writes, with the only possible impact being suboptimal > behavior (e.g., > > >>>>>>>>> extra I/O), as you described. > > >>>>>>>> > > >>>>>>>> > > >>>>>>>> 100% I will also add this summary to the bottom of the > requirements > > >>>>>>>> section. > > >>>>>>>> > > >>>>>>>> Based on mailing list discussion and doc comments (or lack > > >>>>>>>> thereof), it does not seem like there are strong objections to > adding this > > >>>>>>>> for V4. Prashant seemed to maybe have concerns, so I'd like to > understand > > >>>>>>>> if they are blockers. > > >>>>>>>> > > >>>>>>>> If there isn't additional feedback by the end of next week, I'd > > >>>>>>>> like to assume a lazy consensus and consolidate this with the > single file > > >>>>>>>> improvement work, which has already reorganized the metadata > schema [1]. > > >>>>>>>> Please let me know if there is a different process. > > >>>>>>>> > > >>>>>>>> Thanks, > > >>>>>>>> Micah > > >>>>>>>> > > >>>>>>>> [1] > > >>>>>>>> > https://docs.google.com/document/d/1k4x8utgh41Sn1tr98eynDKCWq035SV_f75rtNHcerVw/edit?tab=t.0#heading=h.unn922df0zzw > > >>>>>>>> > > >>>>>>>> On Wed, Dec 17, 2025 at 5:38 PM Yufei Gu <[email protected]> > > >>>>>>>> wrote: > > >>>>>>>> > > >>>>>>>>> Thanks for the clarification, Micah! I want to explicitly call > out > > >>>>>>>>> (and double-confirm) the key principle here: all tags must be > strictly > > >>>>>>>>> optional and never required for correctness or basic > functionality. Engines > > >>>>>>>>> should always be able to safely drop or ignore tags without > breaking reads > > >>>>>>>>> or writes, with the only possible impact being suboptimal > behavior (e.g., > > >>>>>>>>> extra I/O), as you described. > > >>>>>>>>> > > >>>>>>>>> As long as this constraint is clearly stated and enforced, the > > >>>>>>>>> trade-off feels reasonable to me. > > >>>>>>>>> > > >>>>>>>>> Yufei > > >>>>>>>>> > > >>>>>>>>> > > >>>>>>>>> On Mon, Dec 15, 2025 at 4:28 PM Micah Kornfield < > > >>>>>>>>> [email protected]> wrote: > > >>>>>>>>> > > >>>>>>>>>> Hi Yufei, > > >>>>>>>>>> > > >>>>>>>>>>> If one engine started to rely on a tag for certain > reasons(like > > >>>>>>>>>>> clustering algorithm), would data file rewrite(compaction) > by another > > >>>>>>>>>>> engine remove the tag, and break the engine relying on it. > > >>>>>>>>>> > > >>>>>>>>>> > > >>>>>>>>>> The intent here is that dropping tags should never break an > > >>>>>>>>>> engine. But it could cause suboptimal operations. For > instance, one > > >>>>>>>>>> example I brought in the docs is using tags to cache parquet > footer size, > > >>>>>>>>>> to make sure it is fetched in 1 I/O. > > >>>>>>>>>> > > >>>>>>>>>> In this case the following would occur. > > >>>>>>>>>> > > >>>>>>>>>> 1. Engine 1 does a write to file 1 and records its footer > size > > >>>>>>>>>> in tags. > > >>>>>>>>>> 2. Engine 2 does a rewrite/compactions and produces File 2 > > >>>>>>>>>> without tags. > > >>>>>>>>>> 3. Engine 1 then tries to read file 2. The tag for footer > > >>>>>>>>>> length is missing so it falls back reading a reasonable > number of bytes > > >>>>>>>>>> from the end of the parquet file, hoping the entire footer is > retrieved > > >>>>>>>>>> (and if it isn't a second I/O is necessary). > > >>>>>>>>>> > > >>>>>>>>>> Similarly for clustering algorithms, I think the result could > > >>>>>>>>>> yield a sub-optimally clustered table, or perhaps redundant > clustering > > >>>>>>>>>> operations but shouldn't break anything. This is no worse > then the case > > >>>>>>>>>> today though if engine 1 and engine 2 have different > clustering algorithms > > >>>>>>>>>> and they are being run in interleaved fashion on the same > table. In this > > >>>>>>>>>> case it is highly likely that some amount of duplicate > compaction is > > >>>>>>>>>> happening. > > >>>>>>>>>> > > >>>>>>>>>> In the current proposal, any metadata that is required for > proper > > >>>>>>>>>> functioning should never be put in tags. > > >>>>>>>>>> > > >>>>>>>>>> Thanks, > > >>>>>>>>>> Micah > > >>>>>>>>>> > > >>>>>>>>>> > > >>>>>>>>>> On Mon, Dec 15, 2025 at 4:02 PM Yufei Gu < > [email protected]> > > >>>>>>>>>> wrote: > > >>>>>>>>>> > > >>>>>>>>>>> Thanks for the proposal! > > >>>>>>>>>>> > > >>>>>>>>>>> If one engine started to rely on a tag for certain > reasons(like > > >>>>>>>>>>> clustering algorithm), would data file rewrite(compaction) > by another > > >>>>>>>>>>> engine remove the tag, and break the engine relying on it. > > >>>>>>>>>>> > > >>>>>>>>>>> Yufei > > >>>>>>>>>>> > > >>>>>>>>>>> > > >>>>>>>>>>> On Wed, Dec 10, 2025 at 2:58 PM Micah Kornfield < > > >>>>>>>>>>> [email protected]> wrote: > > >>>>>>>>>>> > > >>>>>>>>>>>> Hi Iceberg Dev, > > >>>>>>>>>>>> I added a proposal [1] to add a key-value tags field for > files > > >>>>>>>>>>>> in V4 metadata [2]. More details are in the document but > the intent is to > > >>>>>>>>>>>> allow engines to store optional metadata associated with > these files: > > >>>>>>>>>>>> > > >>>>>>>>>>>> 1. The proposed field is optional and cannot be used for > > >>>>>>>>>>>> metadata required for reading the table correctly. > > >>>>>>>>>>>> 2. It also proposes guard-rails for not letting tags cause > > >>>>>>>>>>>> metadata bloat. > > >>>>>>>>>>>> > > >>>>>>>>>>>> Looking forward to hearing everyone's thoughts and feedback. > > >>>>>>>>>>>> > > >>>>>>>>>>>> Thanks, > > >>>>>>>>>>>> Micah > > >>>>>>>>>>>> > > >>>>>>>>>>>> [1] https://github.com/apache/iceberg/issues/14815 > > >>>>>>>>>>>> [2] > > >>>>>>>>>>>> > https://docs.google.com/document/d/16flxDXjpBiAs_cF3sjCsa7GlvSHQ0Mmm74c8yvYQlSA/edit?tab=t.0#heading=h.cnpb2lth3egz > > >>>>>>>>>>>> > > >>>>>>>>>>>> > > >
