Quick update on this on: we'll cover this on the Dedicated Sync this Thursday (10-11am US / 7-8pm CET). Thanks to Daniel Weeks for getting it on the calendar.
Last time labels was on the sync was 2026-04-15. Plenty of productive offline discussion since then, mostly in the gdoc comment threads. Thanks to everyone who engaged: - *Daniel Weeks* — for the IRC-spec-vs-table-spec framing that now anchors the Alternatives section - *Fokko Driesprong* — for challenging motivation on the cost-based defense and driving the ownership reframe - *Yufei Gu* — for the structure debate that landed us on the split shape - *Sung Yun* — for the early consumption-pattern and addressing questions - *Maninder Parmar* — for the properties-relationship probing - *Christian Thiel* — for pushing on the write API direction Concrete changes in-doc since April: - Problem Statement reframed around catalog-owned metainformation as the load-bearing concept. - Alternatives Considered rewritten with the IRC-spec-vs-table-spec boundary instead of cost arguments. - Structure debate closed on a split shape: labels (flat k/v at the table level, k8s-style) + column-labels (array with field-id). Labels type itself is flat — no internal structure. Same shape applies on LoadViewResponse and namespaces. - CRUD companion as a second tab in the same gdoc — UpdateLabels REST verb, two-class distinction for catalog-managed vs externally-managed keys, optimistic concurrency with ETags. - Working Trino prototype at https://github.com/laskoviymishka/irc-labels/pull/1 — native ALTER TABLE ... SET LABEL DDL translating end-to-end. Parallel work to flag: EJ Wang's first-class Tag concept <https://lists.apache.org/thread/r5r3vpmrfy9wmmb4sdybwcjz1c4wld5b> proposal on dev@. We've agreed to coordinate as paired proposals — Tag as a separate first-class REST concept, labels as the lower-level attachment substrate. Both efforts share the cross-cutting interop question. Goal on Thursday is to walk through the current state, confirm the split-shape lands cleanly, and identify what's needed to move toward a VOTE on the read API. Anyone reading along is welcome to join. Doc (current state): https://docs.google.com/document/d/1aj-6JlfBiMYEEVtNuh5WLMOrRQiMCcyYUGbouPM4hXI/edit Thanks, Andrei On Tue, Mar 24, 2026 at 9:35 PM Andrei Tserakhau < [email protected]> wrote: > Thanks Ryan! > > Your point about avoiding first-class metadata requirements is exactly the > design principle here. Labels let each catalog surface what it knows > without the spec dictating what catalogs must track. > > To build on this, I put together a POC showing the approach works across > the ecosystem. > > Key design principles that held up in practice: > > - No new requirements on catalogs. Labels are optional in the response. A > catalog that doesn't serve labels returns the same response as today. > > - Catalog-scoped, not table state. Every catalog we tried already has > internal metadata separate from Iceberg properties — Polaris has > internalProperties, UC has uc_properties, Lakekeeper has namespace > properties in PostgreSQL. Labels just give this existing metadata a > standard way through the protocol. > > - No property overriding. Labels are explicitly separate from table > properties. Properties configure behavior, labels describe context. Engines > know which is which. > > What built: > > - Spec change: https://github.com/apache/iceberg/pull/15750 > - PyIceberg client: https://github.com/apache/iceberg-python/pull/3191 > > Catalog implementations: > - Polaris: https://github.com/apache/polaris/pull/4048 (labels from > internalProperties) > - Unity Catalog OSS: > https://github.com/unitycatalog/unitycatalog/pull/1417 (labels from > uc_properties) > - Lakekeeper: https://github.com/lakekeeper/lakekeeper/pull/1676 (labels > from namespace properties) > > Full demo: https://github.com/laskoviymishka/irc-labels > > Three catalogs, two languages (Java + Rust), 40-95 lines each. The pattern > is the same everywhere, each catalog already has internal metadata that > doesn't belong in table properties. Labels give it a standard way out > through the protocol. > > The Polaris implementation also addresses > https://github.com/apache/polaris/issues/3222 - the community has been > asking for a way to surface business metadata alongside table loads. Labels > solve this without adding any requirements beyond an optional field. > > Beyond ownership and classification, the demo also shows labels enabling > AI agent table selection (agents reason about tables using semantic labels > instead of guessing from column names) and governance via trusted engine > (ClickHouse reading sensitivity labels to auto-generate masking policies). > > Happy to discuss the spec design or any of the implementation details. > > Andrei > > On Fri, Mar 6, 2026 at 11:25 PM Ryan Blue <[email protected]> wrote: > >> I think that this is a reasonable way to solve some persistent issues >> that we've seen. >> >> Many catalogs track additional metadata that is not part of the table >> spec (or others) like "owner", and right now there is no way to exchange or >> share that information. I'm also hesitant to start including it as >> first-class metadata because that puts additional requirements on catalogs >> that may not align. For instance, Tabular had no concept of a table "owner" >> and instead used default grants at the schema level. I like that this >> solution allows catalogs to provide information in a generic way that >> doesn't add requirements in the REST spec. And it is an alternative to >> overriding table properties with catalog-managed information, which I think >> is an anti-pattern. >> >> Thanks, Andrei! I think this is a good idea. >> >> On Thu, Mar 5, 2026 at 2:04 PM Andrei Tserakhau via dev < >> [email protected]> wrote: >> >>> Hi all, >>> >>> `LoadTableResponse` returns table metadata — schema, snapshots, file >>> locations — but catalogs have operational context about tables that has no >>> standard place to go: cost attribution, ownership, governance hints, >>> semantic metadata. Right now catalogs have two options: >>> >>> 1. Properties — durable, commit-versioned table state. Good for >>> persistent metadata; wrong for ephemeral catalog context. >>> 2. Custom fields — catalog-specific extensions with no interoperability. >>> Each catalog invents its own structure; engines have no basis to read them. >>> >>> The community has already identified this gap. Polaris opened an issue >>> [1] requesting a standard extension point in the IRC protocol for >>> catalog-managed metadata. Two earlier threads [2][3] explored column-level >>> metadata, though in the context of table format changes. >>> >>> We propose adding an optional `labels` field to `LoadTableResponse` for >>> catalog-managed metadata. Labels are string key-value pairs generated >>> per-request from the catalog's internal systems; nothing is written to >>> table files. Engines may use or ignore them entirely. Labels give catalog >>> providers a standard channel to surface context to any client without >>> bilateral custom integrations for every catalog-engine pair. >>> >>> Details: >>> - GitHub Issue: apache/iceberg#15521 >>> - Design Document: [4] >>> >>> Please review the proposal and share your feedback. >>> >>> Thanks, >>> Andrei >>> >>> [1]: https://github.com/apache/polaris/issues/3222 >>> [2]: https://lists.apache.org/thread/vwrc3m534gfyfjnsfflwtgkg158yzrb4 >>> [3]: https://lists.apache.org/thread/yflg8w1h87qgwc4s3qtog4l8nx8nk8m0 >>> [4]: >>> https://docs.google.com/document/d/1aj-6JlfBiMYEEVtNuh5WLMOrRQiMCcyYUGbouPM4hXI/edit?usp=sharing >>> >>
