Re: [DISCUSS][REST] Granularity of referenced-by context in loadTable calls

Christian Thiel Wed, 04 Feb 2026 12:34:45 -0800

Your assumption is correct—the 1st DEFINER view is authorized before the
query engine retrieves its content and learns it references the 2nd DEFINER.


Let me clarify the setup I had in mind: Query engines increasingly support
passing user tokens to the catalog for authorization. Examples include
Starburst's OAuth2 Token Passthrough [1] and StarRocks' JWT authentication
[2].

In such setups, the second request to the 2nd DEFINER view becomes
problematic: the catalog receives a request from a user / invoker lacking
direct access. Using the hypothetical "referenced-by" field—and assuming a
trust relationship with the engine guaranteeing correctness—we must
validate both:

1. The authorization decision for the 1st DEFINER still holds
2. The 1st DEFINER's owner has access to the 2nd

While catalogs could issue short-lived authorization proof when returning
the 1st DEFINER, re-authorizing is equally valid and arguably preferable,
as the information is more current.

Extending this to the TABLE level: we can either provide authorization
proof with the 2nd DEFINER (presented when querying the TABLE), or
re-authorize the entire chain.

Without carrying client-side trust between requests, having the full
(trusted) chain is the only way to authorize TABLE access (again requiring
correctness guarantees through other trust mechanisms). Therefore,
authorizing table access can only be seamlessly explained with the complete
chain. Explicitly providing this information explicitly is preferable to
reconstructing it from the TABLE metadata plus all prior authorization
requests in my opinion - if only for audit logging.

Does that make my thoughts clear?

[1]
https://docs.starburst.io/latest/object-storage/metastores.html#oauth-2-0-token-pass-through
[2]
https://docs.starrocks.io/docs/data_source/catalog/iceberg/iceberg_rest_security/#security-mechanisms

Best,

Christian

On Wed, 4 Feb 2026 at 20:20, Prashant Singh <[email protected]>
wrote:

> Thank you for the feedback Christian !
> I agree having full context could help in Audit purpose.
>
> Though, I am not able to fully understand your feedback from AuthZ pov can
> you please elaborate ?
> IIUC in your example 1st DEFINER => 2nd DEFINER => TABLE
> user's access to 1st DEFINER view would have been Authorized before
> the Query Engine could learn that 1st DEFINER references the 2nd DEFINER, i
> am assuming it has a success in getting the view definition ? All it needs
> to know when loading the table is what the view is referencing, when it's
> authorizing the loadTable.
>
> regarding the referenced-by in the loadView thats a good recommendation,
> let me think more
>
> Best,
> Prashant Singh
>
>
> On Tue, Feb 3, 2026 at 11:28 AM Christian Thiel <
> [email protected]> wrote:
>
>> I prefer to keep the full chain.
>>
>> Consider this scenario:
>> 1st DEFINER => 2nd DEFINER => TABLE
>>
>> When a user has access only to the outer view and the load table endpoint
>> is called, the following authorizations conditions must be ensured:
>>
>>    1. Owners of the DEFINER views still have access to their referenced
>>    objects
>>    2. The querying User has access to his entrypoint - the 1st DEFINER
>>    View
>>
>> If the load table endpoint receives only the immediate parent in
>> referenced-by, we lose critical information for check (2). This means
>> the request data alone—even if trusted—is insufficient to make a complete
>> authorization decision unless the server internally correlates the call to
>> the 2nd DEFINER load with the load table request, as we can't trace it back
>> to the 1st DEFINER otherwise. To make this work consistently we would
>> require referenced-by also for the load View endpoint.
>>
>> Additionally, knowing the user's entry point is valuable for auditing
>> purposes, particularly in DEFINER-heavy implementations.
>>
>> I kind of disagree that postgres DEFINER views don't require deeply
>> nested context.
>>
>> Postgres just handles this chain internally:
>> 1. User is allowed to query 1st DEFINER
>> 2. thus 2nd DEFINER may be used to respond to the query
>> 3. thus TABLE maybe used to respond to the query
>> But propagating this trust relationship in Icebeberg REST is more complex
>> as objects are queried individually, so we can't just validate the full
>> plan, but instead need to be able to validate access to each individual
>> component it requires.
>>
>> Best,
>> Christian
>>
>> On Mon, 2 Feb 2026 at 19:44, Russell Spitzer <[email protected]>
>> wrote:
>>
>>> Just to re-up my comments from the discussion.
>>>
>>> I'm in favor of Immediate Parent only. Full chain seems to be for
>>> situations where we want to be able to "override" the security
>>> definition of an inner nested view. For users who want to
>>> do this, I would encourage them to just make a brand new definer view
>>> without referencing the "invoker" view.
>>>
>>> For example
>>>
>>> DEFINER => INVOKER => TABLE
>>>
>>> The "definer" should not be able to remove the "invoked" nature of
>>> access to the table. If a user really
>>> wants that behavior they should construct
>>>
>>> DEFINER (Combined with INVOKER SQL) => TABLE
>>>
>>> I'd rather we didn't encourage more complicated constructions
>>>
>>> On Mon, Feb 2, 2026 at 12:34 PM Prashant Singh <[email protected]>
>>> wrote:
>>>
>>>> Hi everyone,
>>>>
>>>> I’m currently working on passing additional context via the
>>>> referenced-by parameter in loadTable calls. This is a foundational
>>>> step toward enabling catalogs to make authorization decisions based on
>>>> query execution context.
>>>>
>>>> While the broader trust relationships and AuthZ constructs are outside
>>>> the scope of IRC, I’d like to align on the level of detail we should
>>>> provide. Specifically: *Should we send the entire view reference
>>>> chain, or only the immediate parent view on nested views?*
>>>>
>>>> The following are trade-offs:
>>>>
>>>>    -
>>>>
>>>>    *Full Chain:* Provides maximum flexibility for the server to make
>>>>    complex AuthZ decisions but increases client-side overhead for tracking
>>>>    nested references.
>>>>    -
>>>>
>>>>    *Immediate Parent:* Simpler for the client to implement but
>>>>    provides limited context for sophisticated authorization policies.
>>>>
>>>> *Prior Art & Research:* As noted in this discussion
>>>> <https://github.com/apache/iceberg/pull/13810#discussion_r2747121401>
>>>> (thanks Ryan and Russell), Postgres handles this via DEFINER (owner
>>>> permissions) and INVOKER (query permissions) without requiring deeply
>>>> nested context. My research into other engines hasn't yielded a standard
>>>> "gold level" approach yet, as some platforms simply restrict nested view
>>>> complexity.
>>>>
>>>> I’d love to hear your thoughts on which approach aligns better.
>>>>
>>>> Best regards,
>>>>
>>>> Prashant Singh
>>>>
>>>

Re: [DISCUSS][REST] Granularity of referenced-by context in loadTable calls

Reply via email to