Quick update on this on:
we'll cover this on the Dedicated Sync this Thursday (10-11am US / 7-8pm
CET). Thanks to Daniel Weeks for getting it on the calendar.

Last time labels was on the sync was 2026-04-15. Plenty of productive
offline discussion since then, mostly in the gdoc comment threads. Thanks
to everyone who engaged:

   - *Daniel Weeks* — for the IRC-spec-vs-table-spec framing that now
   anchors the Alternatives section
   - *Fokko Driesprong* — for challenging motivation on the cost-based
   defense and driving the ownership reframe
   - *Yufei Gu* — for the structure debate that landed us on the split shape
   - *Sung Yun* — for the early consumption-pattern and addressing questions
   - *Maninder Parmar* — for the properties-relationship probing
   - *Christian Thiel* — for pushing on the write API direction

Concrete changes in-doc since April:

   - Problem Statement reframed around catalog-owned metainformation as the
   load-bearing concept.
   - Alternatives Considered rewritten with the IRC-spec-vs-table-spec
   boundary instead of cost arguments.
   - Structure debate closed on a split shape: labels (flat k/v at the
   table level, k8s-style) + column-labels (array with field-id). Labels
   type itself is flat — no internal structure. Same shape applies on
   LoadViewResponse and namespaces.
   - CRUD companion as a second tab in the same gdoc — UpdateLabels REST
   verb, two-class distinction for catalog-managed vs externally-managed keys,
   optimistic concurrency with ETags.
   - Working Trino prototype at
   https://github.com/laskoviymishka/irc-labels/pull/1 — native ALTER TABLE
   ... SET LABEL DDL translating end-to-end.

Parallel work to flag: EJ Wang's first-class Tag concept
<https://lists.apache.org/thread/r5r3vpmrfy9wmmb4sdybwcjz1c4wld5b> proposal
on dev@. We've agreed to coordinate as paired proposals — Tag as a separate
first-class REST concept, labels as the lower-level attachment substrate.
Both efforts share the cross-cutting interop question.

Goal on Thursday is to walk through the current state, confirm the
split-shape lands cleanly, and identify what's needed to move toward a VOTE
on the read API. Anyone reading along is welcome to join.

Doc (current state):
https://docs.google.com/document/d/1aj-6JlfBiMYEEVtNuh5WLMOrRQiMCcyYUGbouPM4hXI/edit

Thanks,
Andrei

On Tue, Mar 24, 2026 at 9:35 PM Andrei Tserakhau <
[email protected]> wrote:

> Thanks Ryan!
>
> Your point about avoiding first-class metadata requirements is exactly the
> design principle here. Labels let each catalog surface what it knows
> without the spec dictating what catalogs must track.
>
> To build on this, I put together a POC showing the approach works across
> the ecosystem.
>
> Key design principles that held up in practice:
>
> - No new requirements on catalogs. Labels are optional in the response. A
> catalog that doesn't serve labels returns the same response as today.
>
> - Catalog-scoped, not table state. Every catalog we tried already has
> internal metadata separate from Iceberg properties — Polaris has
> internalProperties, UC has uc_properties, Lakekeeper has namespace
> properties in PostgreSQL. Labels just give this existing metadata a
> standard way through the protocol.
>
> - No property overriding. Labels are explicitly separate from table
> properties. Properties configure behavior, labels describe context. Engines
> know which is which.
>
> What built:
>
> - Spec change: https://github.com/apache/iceberg/pull/15750
> - PyIceberg client: https://github.com/apache/iceberg-python/pull/3191
>
> Catalog implementations:
> - Polaris: https://github.com/apache/polaris/pull/4048 (labels from
> internalProperties)
> - Unity Catalog OSS:
> https://github.com/unitycatalog/unitycatalog/pull/1417 (labels from
> uc_properties)
> - Lakekeeper: https://github.com/lakekeeper/lakekeeper/pull/1676 (labels
> from namespace properties)
>
> Full demo: https://github.com/laskoviymishka/irc-labels
>
> Three catalogs, two languages (Java + Rust), 40-95 lines each. The pattern
> is the same everywhere, each catalog already has internal metadata that
> doesn't belong in table properties. Labels give it a standard way out
> through the protocol.
>
> The Polaris implementation also addresses
> https://github.com/apache/polaris/issues/3222 - the community has been
> asking for a way to surface business metadata alongside table loads. Labels
> solve this without adding any requirements beyond an optional field.
>
> Beyond ownership and classification, the demo also shows labels enabling
> AI agent table selection (agents reason about tables using semantic labels
> instead of guessing from column names) and governance via trusted engine
> (ClickHouse reading sensitivity labels to auto-generate masking policies).
>
> Happy to discuss the spec design or any of the implementation details.
>
> Andrei
>
> On Fri, Mar 6, 2026 at 11:25 PM Ryan Blue <[email protected]> wrote:
>
>> I think that this is a reasonable way to solve some persistent issues
>> that we've seen.
>>
>> Many catalogs track additional metadata that is not part of the table
>> spec (or others) like "owner", and right now there is no way to exchange or
>> share that information. I'm also hesitant to start including it as
>> first-class metadata because that puts additional requirements on catalogs
>> that may not align. For instance, Tabular had no concept of a table "owner"
>> and instead used default grants at the schema level. I like that this
>> solution allows catalogs to provide information in a generic way that
>> doesn't add requirements in the REST spec. And it is an alternative to
>> overriding table properties with catalog-managed information, which I think
>> is an anti-pattern.
>>
>> Thanks, Andrei! I think this is a good idea.
>>
>> On Thu, Mar 5, 2026 at 2:04 PM Andrei Tserakhau via dev <
>> [email protected]> wrote:
>>
>>> Hi all,
>>>
>>> `LoadTableResponse` returns table metadata — schema, snapshots, file
>>> locations — but catalogs have operational context about tables that has no
>>> standard place to go: cost attribution, ownership, governance hints,
>>> semantic metadata. Right now catalogs have two options:
>>>
>>> 1. Properties — durable, commit-versioned table state. Good for
>>> persistent metadata; wrong for ephemeral catalog context.
>>> 2. Custom fields — catalog-specific extensions with no interoperability.
>>> Each catalog invents its own structure; engines have no basis to read them.
>>>
>>> The community has already identified this gap. Polaris opened an issue
>>> [1] requesting a standard extension point in the IRC protocol for
>>> catalog-managed metadata. Two earlier threads [2][3] explored column-level
>>> metadata, though in the context of table format changes.
>>>
>>> We propose adding an optional `labels` field to `LoadTableResponse` for
>>> catalog-managed metadata. Labels are string key-value pairs generated
>>> per-request from the catalog's internal systems; nothing is written to
>>> table files. Engines may use or ignore them entirely. Labels give catalog
>>> providers a standard channel to surface context to any client without
>>> bilateral custom integrations for every catalog-engine pair.
>>>
>>> Details:
>>> - GitHub Issue: apache/iceberg#15521
>>> - Design Document: [4]
>>>
>>> Please review the proposal and share your feedback.
>>>
>>> Thanks,
>>> Andrei
>>>
>>> [1]: https://github.com/apache/polaris/issues/3222
>>> [2]: https://lists.apache.org/thread/vwrc3m534gfyfjnsfflwtgkg158yzrb4
>>> [3]: https://lists.apache.org/thread/yflg8w1h87qgwc4s3qtog4l8nx8nk8m0
>>> [4]:
>>> https://docs.google.com/document/d/1aj-6JlfBiMYEEVtNuh5WLMOrRQiMCcyYUGbouPM4hXI/edit?usp=sharing
>>>
>>

Reply via email to