Re: [DISCUSS] Write-path gap for field-id-bound policy during schema evolution

Yufei Gu Mon, 22 Jun 2026 10:20:18 -0700

I tend to agree that this is mostly a catalog implementation concern.

In practice, policy changes are rarely committed in the same transaction as
table DDL. That said, a catalog could choose to support that model if it
wants stronger guarantees around column creation/update and policy
attachment.


So I don’t think the IRC necessarily needs to mandate this behavior. It
feels reasonable to leave it as a catalog capability or implementation
choice.

Yufei


On Wed, Jun 17, 2026 at 7:42 AM Sung Yun <[email protected]> wrote:

> Hi Prashant, thanks for taking a look at the doc and for sharing the link.
>
> I agree with you that binding policies using field-ids is the best way to
> address the "Drift" problem I've codified in the doc. What I'm hoping to
> discuss is that since the field-id doesn't exist until the schema change is
> committed, there's a window of time when a column intended to be protected
> is not, until that policy is attached out of band.
>
> The question I hope to answer with the community is whether this is a
> limitation we're okay accepting, or whether we want to allow a column to
> land already protected, in the same commit that creates it.
> I'm hoping the catalog sync will help clarify the problem statement. And
> likewise, looking forward to discussing it more.
>
> Sung
>
> On Tue, Jun 16, 2026 at 3:39 PM Prashant Singh <[email protected]>
> wrote:
>
>> Hey Sung,
>> I understand people define and attach policies by name but I don't think
>> engines / metastore keep names as metadata (at least for the engine /
>> catalogs i have worked with). These column names are resolved to field id
>> before mapping of policy to table is persisted, this means for
>> attachment column must exist.
>> Much like one creates an iceberg table with column names and those are
>> assigned fieldId which are kind of opaque to the user, then for all
>> operations fieldID becomes our source of truth and preserves things like
>> rename.
>>
>> We discussed this exact scenario when we were modeling ReadRestrictions
>> as well and made ReadRestrictions return name, and left it to be catalog's
>> choice for their metadata representation (09/10/25) :
>> https://www.youtube.com/watch?v=orAXA5e9pmU&t=2867s as ideal would be to
>> just rely on field id in the first place.
>> So it's intentionally left that way since defining policy and attaching
>> policy are entirely catalog concerns and how catalog comes with read
>> restrictions is entirely catalog implementer choice, I personally don't see
>> a gap here.
>>
>> With that being said this doesn't mean we can't rediscuss, thank you for
>> putting to agenda for sync. Looking forward to it.
>>
>> Best,
>> Prashant Singh
>>
>> On Tue, Jun 16, 2026 at 8:14 AM Sung Yun <[email protected]> wrote:
>>
>>> Hi Andrei, that's a sharp framing, and I think you've identified
>>> something broader that spans multiple constructs being discussed today. I
>>> agree that it's worth discussing the meta pattern as its own topic.
>>>
>>> On the data governance side: before we settle what a shared write-side
>>> shape should look like, I think it would help to first establish whether
>>> the specific problems are ones the community agrees are worth solving. The
>>> Drift and Creation problems I raised in the Google Doc have security
>>> consequences for FGAC policies, and whether they merit a first-class
>>> construct in the IRC write path is a data-governance question that I think
>>> is worth putting to the community on its own terms.
>>>
>>> I'll bring the policy case to the IRC sync, and I'm glad to dig into the
>>> meta pattern with you and Sebastian and the rest of the community as it
>>> sharpens.
>>>
>>> Sung
>>>
>>> On 2026/06/16 12:13:12 Andrei Tserakhau via dev wrote:
>>> > Hi Sung,
>>> >
>>> > Thanks for raising this. The overlap with the labels write-side
>>> > work I've been drafting with @Sebastian Baunsgaard
>>> > <[email protected]>  is structural --
>>> > same lifecycle, field-id binding, co-commit and concurrency questions,
>>> > different payload.
>>> >
>>> > But what stands out more than the specific overlap is that this is
>>> > the first sighting of a pattern the spec doesn't yet have a
>>> > framework for: catalog-authored, lifecycle-managed write APIs that
>>> > reach deep into catalog-owned space. Read Restrictions already
>>> > does this on the read side — policies are very much catalog
>>> > territory. Your authoring proposal extends it to write. Labels
>>> > CRUD lands in the same neighborhood from a different direction.
>>> >
>>> > Kevin asked something close to this at the May 28 labels sync:
>>> > what's the pattern for introducing new first-class concepts in the
>>> > REST spec? Ryan's answer pointed at the shape (CRUD verb +
>>> > transactional path), but the deeper question hasn't been worked
>>> > through — should the spec standardize the write side of
>>> > catalog-owned territory at all, or is this best left ad-hoc per
>>> > proposal with capability negotiation governing client expectations?
>>> >
>>> > I lean toward Capabilities being the right frame here. Catalogs
>>> > opt in, clients discover what's supported, the spec doesn't force
>>> > standardization deep into catalog territory. A unified write-side
>>> > surface has real value for clients — engines, custom tools, one
>>> > shape to learn — but real cost too: catalog innovation space
>>> > shrinks to differentiators inside a spec-prescribed envelope.
>>> >
>>> > So before aligning on specific conventions, worth asking the
>>> > meta-question: shall we go this direction at all? And if yes —
>>> > ad-hoc per proposal, or a deliberate meta-framework?
>>> >
>>> > This is broader than labels alone, so probably worth raising at
>>> > one of the upcoming catalog community syncs as a meta-topic
>>> > rather than the labels-specific sync. Labels sync can pick up
>>> > labels-side implications afterward, once the broader direction
>>> > is clearer.
>>> >
>>> > Best,
>>> > Andrei
>>> >
>>> > On Mon, Jun 15, 2026 at 9:10 AM Sung Yun <[email protected]> wrote:
>>> >
>>> > > Hi Dan,
>>> > >
>>> > > Apologies for the confusion. "Write" was a poor word choice on my
>>> part. I
>>> > > didn't mean enforcing policy on writers and you're right that a
>>> writer
>>> > > holds the highest level of access, and there's little to restrict
>>> there. By
>>> > > "write" I meant to refer to the lifecycle of the policies themselves:
>>> > > creating, updating, and deleting them. Enforcement stays on the read
>>> side
>>> > > (#13879). This is the complementary authoring path discussion on
>>> policies.
>>> > >
>>> > > The problem I'm looking at is that a policy bound to a column by
>>> name can
>>> > > detach or retarget when that column is renamed or dropped. A policy
>>> also
>>> > > can't land in the same commit as the column it protects, so the
>>> column can
>>> > > exist before its protection does. I've written up the analysis and a
>>> > > direction that could close it [1], and I'd appreciate your review.
>>> > >
>>> > > Christian, thanks. I agree with your pointers. Drop+re-add is a good
>>> > > example of the general case and it faces the same exposure as any
>>> schema
>>> > > change when policy is managed separately from the schema change that
>>> > > introduces it, which is exactly the problem the doc works through.
>>> I'd
>>> > > value your review on the shared doc.
>>> > >
>>> > > Sung
>>> > >
>>> > > [1]
>>> > >
>>> https://docs.google.com/document/d/1yL2Yv70hJ569dpLdW_upTzzK8Zb3fAFEKEH4JRdosjU/edit?tab=t.0
>>> > >
>>> > > On 2026/06/15 15:19:12 Daniel Weeks wrote:
>>> > > > Hey Sung,
>>> > > >
>>> > > > I'm not sure I fully understand the use case here.  Generally,
>>> readers
>>> > > can
>>> > > > have different policies when they consume data (what's
>>> > > > restricted/hidden/obfuscated).  However, on the write path, I'm
>>> not aware
>>> > > > of scenarios where similar policies would be applied.  A writer
>>> typically
>>> > > > has the highest level of access because they need to read
>>> (metadata at
>>> > > > minimum) and write (both metadata and data).
>>> > > >
>>> > > > What use cases are you envisioning for write side policy
>>> enforcement?
>>> > > >
>>> > > > Thanks,
>>> > > > Dan
>>> > > >
>>> > > > On Sun, Jun 14, 2026 at 11:43 PM Christian Thiel <
>>> > > [email protected]>
>>> > > > wrote:
>>> > > >
>>> > > > > Hello Sung,
>>> > > > >
>>> > > > > thanks for sharing this!
>>> > > > >
>>> > > > > I'd definitely be interested in seeing your ideas for the
>>> proposal.
>>> > > > > Especially your point about field-id binding had me thinking —
>>> since
>>> > > admins
>>> > > > > author against names and never see field-ids today, it'd be worth
>>> > > spelling
>>> > > > > out where and when that name→field-id binding happens, and how it
>>> > > handles
>>> > > > > drop+re-add.
>>> > > > >
>>> > > > > I think a number of interesting points are worth discussing such
>>> as
>>> > > > > coexistence with external policy engines and separation of
>>> duties on
>>> > > > > commit, while still keeping the field-id binding intact where it
>>> > > applies.
>>> > > > >
>>> > > > > Looking forward to it!
>>> > > > >
>>> > > > > Best,
>>> > > > > Christian
>>> > > > >
>>> > > > > On Fri, 5 Jun 2026 at 22:59, Sung Yun <[email protected]> wrote:
>>> > > > >
>>> > > > >> Hi folks,
>>> > > > >>
>>> > > > >> The FGAC / Read Restriction proposal [1] is introducing a
>>> read-side
>>> > > path
>>> > > > >> to standardize how we describe row filters and masks, and to do
>>> it
>>> > > safely
>>> > > > >> across schema evolution by binding them to field-ids. We don't
>>> yet
>>> > > have
>>> > > > >> anything matching on the write path.
>>> > > > >>
>>> > > > >> Today, policies are administered entirely outside the REST
>>> protocol,
>>> > > so
>>> > > > >> external systems reference columns by name, as they're not part
>>> of the
>>> > > > >> commit and never see field-ids. And two things break once
>>> schema and
>>> > > policy
>>> > > > >> have to change together:
>>> > > > >> - a policy bound to a column name silently re-targets when the
>>> column
>>> > > is
>>> > > > >> renamed
>>> > > > >> - a policy commits separately from the schema change it depends
>>> on,
>>> > > so a
>>> > > > >> column can exist before its protection does
>>> > > > >>
>>> > > > >> So far, policy administration has been left out of scope [2],
>>> and now
>>> > > > >> that the Read Restrictions Proposal is finding consensus, I
>>> believe
>>> > > it is a
>>> > > > >> good time to start thinking about it on the write path.
>>> > > > >> I have a rough direction in mind, of enabling co-committing
>>> policy and
>>> > > > >> binding it to field-ids on the server-side. So I wanted to
>>> gauge:
>>> > > > >> 1. whether people see this as a gap worth closing in the IRC
>>> protocol
>>> > > > >> 2. whether there are concerns or considerations that should be
>>> taken
>>> > > into
>>> > > > >> account
>>> > > > >>
>>> > > > >> If there's interest, I'm happy to put together a detailed
>>> proposal and
>>> > > > >> share it here for discussion.
>>> > > > >>
>>> > > > >> Sung
>>> > > > >>
>>> > > > >> [1]
>>> > > > >>
>>> > >
>>> https://docs.google.com/document/d/108Y0E8XsZi91x-UY0_aHLEbmXDNmxmS5BnDjunEKvTM/edit?tab=t.7l861fq8jo38
>>> > > > >> [2]
>>> https://lists.apache.org/thread/2jx33fn7lq37oxxm7sd6rjy0dnvbm4t6
>>> > > > >>
>>> > > > >
>>> > > >
>>> > >
>>> >
>>>
>>

Re: [DISCUSS] Write-path gap for field-id-bound policy during schema evolution

Reply via email to