I added/accepted on the dev calendar. Looking forward to it!

On Tue, Mar 24, 2026 at 5:34 PM Micah Kornfield <[email protected]>
wrote:

> It seems like we might not have full alignment on this proposal, I
> tentatively scheduled a sync for next Monday (added to the iceberg dev
> events calendar).  Please let me know if you are interested in joining and
> the time doesn't work for you (we can reschedule accordingly).
>
> Thanks,
> Micah
>
> On 2026/02/09 23:15:49 Micah Kornfield wrote:
> > As an update I've made the proposal to add this field to the Single file
> > commits doc.
> >
> > Please let me know if there is any additional feedback.
> >
> > Thanks,
> > Micah
> >
> > On Wed, Jan 21, 2026 at 5:16 PM Micah Kornfield <[email protected]>
> > wrote:
> >
> > > Thanks Manu, that is the right doc.
> > >
> > > As an update, I've incorporated feedback from the community to the
> > > document:
> > >
> > > At a high level the changes are:
> > > - Renamed the field from "tags" to "attributes"
> > > - Clarified limits on attributes should only be enforced for new data.
> > > Existing tags must always be carried through.
> > > - Added more details on enforcing size of tags.
> > >
> > > Are there any objections to folding the proposal into the V4 metadata
> > > proposal?  Again, the reasons for doing so are mostly around ensuring
> > > consistent field numbering and making the spec update easier.
> > >
> > > If people want further discussion on this I'd be happy to discuss at
> the
> > > next V4 metadata sync or create a one-off meeting.  Please let me know.
> > >
> > > Thanks,
> > > Micah
> > >
> > > On Mon, Jan 5, 2026 at 5:48 PM Manu Zhang <[email protected]>
> wrote:
> > >
> > >> Happy new year Micah. Are you linking the wrong doc (Iceberg Single
> File
> > >> Commits) ?
> > >> I think you are referring to
> > >>
> https://docs.google.com/document/d/16flxDXjpBiAs_cF3sjCsa7GlvSHQ0Mmm74c8yvYQlSA/edit?tab=t.0#heading=h.cnpb2lth3egz
> > >>
> > >> Best,
> > >> Manu
> > >>
> > >> On Tue, Jan 6, 2026 at 2:19 AM Micah Kornfield <[email protected]
> >
> > >> wrote:
> > >>
> > >>> Happy new year everyone, I just wanted to bump this thread (most
> > >>> discussion has been happening on the doc [1]) in case it was missed
> over
> > >>> the holidays.
> > >>>
> > >>> Thanks,
> > >>> Micah
> > >>>
> > >>> [1]
> > >>>
> https://docs.google.com/document/d/1k4x8utgh41Sn1tr98eynDKCWq035SV_f75rtNHcerVw/edit?tab=t.0#heading=h.unn922df0zzw
> > >>>
> > >>> On Fri, Dec 19, 2025 at 2:14 PM Micah Kornfield <
> [email protected]>
> > >>> wrote:
> > >>>
> > >>>> Sounds good, will wait until next year.
> > >>>>
> > >>>> On Fri, Dec 19, 2025 at 2:13 PM Steven Wu <[email protected]>
> wrote:
> > >>>>
> > >>>>> Micah, many people will be OOO in the next two weeks. Can we extend
> > >>>>> the feedback deadline to at least 1-2 weeks after the new year?
> > >>>>>
> > >>>>> On Fri, Dec 19, 2025 at 8:45 AM Micah Kornfield <
> [email protected]>
> > >>>>> wrote:
> > >>>>>
> > >>>>>> > I have no problem with adding this discussion to the single file
> > >>>>>> work, but I'm not sure that would speed it up? Seems like this is
> a pretty
> > >>>>>> independent addition to the metadata layout?
> > >>>>>>
> > >>>>>> Yes, it is fairly independent.  The main reason I wanted to
> > >>>>>> consolidate in the doc, it appears there is  a bit of metadata
> > >>>>>> re-arrangement and new fields.  I wanted to make sure that:
> > >>>>>>
> > >>>>>> 1.  We avoid field ID conflicts.
> > >>>>>> 2.  When writing up the final spec changes it is easy to manage
> and
> > >>>>>> not create a dependency one way or another between the two of
> these.
> > >>>>>>
> > >>>>>> Happy to keep the implementation of the guard-rails as a separate
> > >>>>>> piece of work.
> > >>>>>>
> > >>>>>> Cheers,
> > >>>>>> Micah
> > >>>>>>
> > >>>>>> On Fri, Dec 19, 2025 at 7:31 AM Russell Spitzer <
> > >>>>>> [email protected]> wrote:
> > >>>>>>
> > >>>>>>> I have no problem with adding this discussion to the single file
> > >>>>>>> work, but I'm not sure that would speed it up? Seems like this
> is a pretty
> > >>>>>>> independent addition to the metadata layout?
> > >>>>>>>
> > >>>>>>> On Thu, Dec 18, 2025 at 6:28 PM Micah Kornfield <
> > >>>>>>> [email protected]> wrote:
> > >>>>>>>
> > >>>>>>>> Thanks for the clarification, Micah! I want to explicitly call
> out
> > >>>>>>>>> (and double-confirm) the key principle here: all tags must be
> strictly
> > >>>>>>>>> optional and never required for correctness or basic
> functionality. Engines
> > >>>>>>>>> should always be able to safely drop or ignore tags without
> breaking reads
> > >>>>>>>>> or writes, with the only possible impact being suboptimal
> behavior (e.g.,
> > >>>>>>>>> extra I/O), as you described.
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>> 100% I will also add this summary to the bottom of the
> requirements
> > >>>>>>>> section.
> > >>>>>>>>
> > >>>>>>>> Based on mailing list discussion and doc comments (or lack
> > >>>>>>>> thereof), it does not seem like there are strong objections to
> adding this
> > >>>>>>>> for V4.  Prashant seemed to maybe have concerns, so I'd like to
> understand
> > >>>>>>>> if they are blockers.
> > >>>>>>>>
> > >>>>>>>> If there isn't additional feedback by the end of next week, I'd
> > >>>>>>>> like to assume a lazy consensus and consolidate this with the
> single file
> > >>>>>>>> improvement work, which has already reorganized the metadata
> schema [1].
> > >>>>>>>> Please let me know if there is a different process.
> > >>>>>>>>
> > >>>>>>>> Thanks,
> > >>>>>>>> Micah
> > >>>>>>>>
> > >>>>>>>> [1]
> > >>>>>>>>
> https://docs.google.com/document/d/1k4x8utgh41Sn1tr98eynDKCWq035SV_f75rtNHcerVw/edit?tab=t.0#heading=h.unn922df0zzw
> > >>>>>>>>
> > >>>>>>>> On Wed, Dec 17, 2025 at 5:38 PM Yufei Gu <[email protected]>
> > >>>>>>>> wrote:
> > >>>>>>>>
> > >>>>>>>>> Thanks for the clarification, Micah! I want to explicitly call
> out
> > >>>>>>>>> (and double-confirm) the key principle here: all tags must be
> strictly
> > >>>>>>>>> optional and never required for correctness or basic
> functionality. Engines
> > >>>>>>>>> should always be able to safely drop or ignore tags without
> breaking reads
> > >>>>>>>>> or writes, with the only possible impact being suboptimal
> behavior (e.g.,
> > >>>>>>>>> extra I/O), as you described.
> > >>>>>>>>>
> > >>>>>>>>> As long as this constraint is clearly stated and enforced, the
> > >>>>>>>>> trade-off feels reasonable to me.
> > >>>>>>>>>
> > >>>>>>>>> Yufei
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> On Mon, Dec 15, 2025 at 4:28 PM Micah Kornfield <
> > >>>>>>>>> [email protected]> wrote:
> > >>>>>>>>>
> > >>>>>>>>>> Hi Yufei,
> > >>>>>>>>>>
> > >>>>>>>>>>> If one engine started to rely on a tag for certain
> reasons(like
> > >>>>>>>>>>> clustering algorithm), would data file rewrite(compaction)
> by another
> > >>>>>>>>>>> engine remove the tag, and break the engine relying on it.
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>> The intent here is that dropping tags should never break an
> > >>>>>>>>>> engine.  But it could cause suboptimal operations.  For
> instance, one
> > >>>>>>>>>> example I brought in the docs is using tags to cache parquet
> footer size,
> > >>>>>>>>>> to make sure it is fetched in 1 I/O.
> > >>>>>>>>>>
> > >>>>>>>>>> In this case the following would occur.
> > >>>>>>>>>>
> > >>>>>>>>>> 1.  Engine 1 does a write to file 1 and records its footer
> size
> > >>>>>>>>>> in tags.
> > >>>>>>>>>> 2.  Engine 2 does a rewrite/compactions and produces File 2
> > >>>>>>>>>> without tags.
> > >>>>>>>>>> 3.  Engine 1 then tries to read file 2.  The tag for footer
> > >>>>>>>>>> length is missing so it falls back reading a reasonable
> number of bytes
> > >>>>>>>>>> from the end of the parquet file, hoping the entire footer is
> retrieved
> > >>>>>>>>>> (and if it isn't a second I/O is necessary).
> > >>>>>>>>>>
> > >>>>>>>>>> Similarly for clustering algorithms, I think the result could
> > >>>>>>>>>> yield a sub-optimally clustered table, or perhaps redundant
> clustering
> > >>>>>>>>>> operations but shouldn't break anything. This is no worse
> then the case
> > >>>>>>>>>> today though if engine 1 and engine 2 have different
> clustering algorithms
> > >>>>>>>>>> and they are being run in interleaved fashion on the same
> table.  In this
> > >>>>>>>>>> case it is highly likely that some amount of duplicate
> compaction is
> > >>>>>>>>>> happening.
> > >>>>>>>>>>
> > >>>>>>>>>> In the current proposal, any metadata that is required for
> proper
> > >>>>>>>>>> functioning should never be put in tags.
> > >>>>>>>>>>
> > >>>>>>>>>> Thanks,
> > >>>>>>>>>> Micah
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>> On Mon, Dec 15, 2025 at 4:02 PM Yufei Gu <
> [email protected]>
> > >>>>>>>>>> wrote:
> > >>>>>>>>>>
> > >>>>>>>>>>> Thanks for the proposal!
> > >>>>>>>>>>>
> > >>>>>>>>>>> If one engine started to rely on a tag for certain
> reasons(like
> > >>>>>>>>>>> clustering algorithm), would data file rewrite(compaction)
> by another
> > >>>>>>>>>>> engine remove the tag, and break the engine relying on it.
> > >>>>>>>>>>>
> > >>>>>>>>>>> Yufei
> > >>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>> On Wed, Dec 10, 2025 at 2:58 PM Micah Kornfield <
> > >>>>>>>>>>> [email protected]> wrote:
> > >>>>>>>>>>>
> > >>>>>>>>>>>> Hi Iceberg Dev,
> > >>>>>>>>>>>> I added a proposal [1] to add a key-value tags field for
> files
> > >>>>>>>>>>>> in V4 metadata [2].  More details are in the document but
> the intent is to
> > >>>>>>>>>>>> allow engines to store optional metadata associated with
> these files:
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> 1.  The proposed field is optional and cannot be used for
> > >>>>>>>>>>>> metadata required for reading the table correctly.
> > >>>>>>>>>>>> 2.  It also proposes guard-rails for not letting tags cause
> > >>>>>>>>>>>> metadata bloat.
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> Looking forward to hearing everyone's thoughts and feedback.
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> Thanks,
> > >>>>>>>>>>>> Micah
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> [1] https://github.com/apache/iceberg/issues/14815
> > >>>>>>>>>>>> [2]
> > >>>>>>>>>>>>
> https://docs.google.com/document/d/16flxDXjpBiAs_cF3sjCsa7GlvSHQ0Mmm74c8yvYQlSA/edit?tab=t.0#heading=h.cnpb2lth3egz
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>
> >
>

Reply via email to