Re: [DISCUSS][REST] Granularity of referenced-by context in loadTable calls

Ryan Blue Wed, 04 Feb 2026 13:49:48 -0800

The DEFINER view referenced by a DEFINER view is a good case to think
about, but I don’t think that it requires the entire reference chain in
order to be secure.


Using the object names from Russell’s response, when view B is loaded and
referenced-by is A, the catalog must trust that the engine is setting
referenced-by correctly. It trusts that the engine will not lie and say
that B is referenced from A instead of another view, and it trusts that
projections, filters, etc. from A will be applied to data from B.

I think the question here is whether the first guarantee, that A was loaded
and referenced B, is sufficient when deciding whether the query has access
to B and the tables it references. The catalog *could* assume that because B
is the referenced-by for C from a trusted engine, that the query must have
access to B. Otherwise, B would not have been provided to the engine. Are
there cases where an engine might load B but not intend to allow access to
the tables it references?

I think there’s a fair argument that those cases exist. When tables or
views are loaded, there’s no intent included. The catalog doesn’t know
whether a view was loaded for a SHOW HISTORY command or because it is being
updated or being run. So a view could be loaded because a user has some
other permission, like MODIFY, but not SELECT. Or maybe a permission to
audit the view but not see data. If the catalog allows those cases, then
being able to load B doesn’t necessarily mean the query has access to the
data that B produces. In that case, you would need to check the permissions
that A has on B to determine whether to load/vend credentials for C.

In writing this email, I think I’ve been convinced that Christian is
correct and that it is best to keep the reference chain. Russell and
Prashant, what do you think?

Ryan

On Wed, Feb 4, 2026 at 1:12 PM Russell Spitzer <[email protected]>
wrote:

> I understand the logging concern but not the correctness one.
>
> Are you saying we have to re-check to make sure nothing has changed since
> we started?
>
> I would assume in this auth chain we could get by with a referenced_by in
> the view request as well?
> A  (View) => B (View) => C (Table)
> LoadView(A)                                   gets the first view
> LoadView(B, referenced_by A)       is for the second view using
> "referenced_by" the first view
> LoadTable(C, referenced_by B)      Finally we request the table using
> referenced_by the second view
>
> Do we need the full chain in this case?
>
> I'm kind of convinced though by the logging argument since that would be
> useful information to have, although I'm not
> sure the Catalog couldn't piece this back together. It would definitely be
> simpler to have it just always present.
>
> On Wed, Feb 4, 2026 at 2:34 PM Christian Thiel <[email protected]>
> wrote:
>
>> Your assumption is correct—the 1st DEFINER view is authorized before the
>> query engine retrieves its content and learns it references the 2nd DEFINER.
>>
>> Let me clarify the setup I had in mind: Query engines increasingly
>> support passing user tokens to the catalog for authorization. Examples
>> include Starburst's OAuth2 Token Passthrough [1] and StarRocks' JWT
>> authentication [2].
>>
>> In such setups, the second request to the 2nd DEFINER view becomes
>> problematic: the catalog receives a request from a user / invoker lacking
>> direct access. Using the hypothetical "referenced-by" field—and assuming a
>> trust relationship with the engine guaranteeing correctness—we must
>> validate both:
>>
>> 1. The authorization decision for the 1st DEFINER still holds
>> 2. The 1st DEFINER's owner has access to the 2nd
>>
>> While catalogs could issue short-lived authorization proof when returning
>> the 1st DEFINER, re-authorizing is equally valid and arguably preferable,
>> as the information is more current.
>>
>> Extending this to the TABLE level: we can either provide authorization
>> proof with the 2nd DEFINER (presented when querying the TABLE), or
>> re-authorize the entire chain.
>>
>> Without carrying client-side trust between requests, having the full
>> (trusted) chain is the only way to authorize TABLE access (again requiring
>> correctness guarantees through other trust mechanisms). Therefore,
>> authorizing table access can only be seamlessly explained with the complete
>> chain. Explicitly providing this information explicitly is preferable to
>> reconstructing it from the TABLE metadata plus all prior authorization
>> requests in my opinion - if only for audit logging.
>>
>> Does that make my thoughts clear?
>>
>> [1]
>> https://docs.starburst.io/latest/object-storage/metastores.html#oauth-2-0-token-pass-through
>> [2]
>> https://docs.starrocks.io/docs/data_source/catalog/iceberg/iceberg_rest_security/#security-mechanisms
>>
>> Best,
>>
>> Christian
>>
>> On Wed, 4 Feb 2026 at 20:20, Prashant Singh <[email protected]>
>> wrote:
>>
>>> Thank you for the feedback Christian !
>>> I agree having full context could help in Audit purpose.
>>>
>>> Though, I am not able to fully understand your feedback from AuthZ pov
>>> can you please elaborate ?
>>> IIUC in your example 1st DEFINER => 2nd DEFINER => TABLE
>>> user's access to 1st DEFINER view would have been Authorized before
>>> the Query Engine could learn that 1st DEFINER references the 2nd DEFINER, i
>>> am assuming it has a success in getting the view definition ? All it needs
>>> to know when loading the table is what the view is referencing, when
>>> it's authorizing the loadTable.
>>>
>>> regarding the referenced-by in the loadView thats a good recommendation,
>>> let me think more
>>>
>>> Best,
>>> Prashant Singh
>>>
>>>
>>> On Tue, Feb 3, 2026 at 11:28 AM Christian Thiel <
>>> [email protected]> wrote:
>>>
>>>> I prefer to keep the full chain.
>>>>
>>>> Consider this scenario:
>>>> 1st DEFINER => 2nd DEFINER => TABLE
>>>>
>>>> When a user has access only to the outer view and the load table
>>>> endpoint is called, the following authorizations conditions must be 
>>>> ensured:
>>>>
>>>>    1. Owners of the DEFINER views still have access to their
>>>>    referenced objects
>>>>    2. The querying User has access to his entrypoint - the 1st DEFINER
>>>>    View
>>>>
>>>> If the load table endpoint receives only the immediate parent in
>>>> referenced-by, we lose critical information for check (2). This means
>>>> the request data alone—even if trusted—is insufficient to make a complete
>>>> authorization decision unless the server internally correlates the call to
>>>> the 2nd DEFINER load with the load table request, as we can't trace it back
>>>> to the 1st DEFINER otherwise. To make this work consistently we would
>>>> require referenced-by also for the load View endpoint.
>>>>
>>>> Additionally, knowing the user's entry point is valuable for auditing
>>>> purposes, particularly in DEFINER-heavy implementations.
>>>>
>>>> I kind of disagree that postgres DEFINER views don't require deeply
>>>> nested context.
>>>>
>>>> Postgres just handles this chain internally:
>>>> 1. User is allowed to query 1st DEFINER
>>>> 2. thus 2nd DEFINER may be used to respond to the query
>>>> 3. thus TABLE maybe used to respond to the query
>>>> But propagating this trust relationship in Icebeberg REST is more
>>>> complex as objects are queried individually, so we can't just validate the
>>>> full plan, but instead need to be able to validate access to each
>>>> individual component it requires.
>>>>
>>>> Best,
>>>> Christian
>>>>
>>>> On Mon, 2 Feb 2026 at 19:44, Russell Spitzer <[email protected]>
>>>> wrote:
>>>>
>>>>> Just to re-up my comments from the discussion.
>>>>>
>>>>> I'm in favor of Immediate Parent only. Full chain seems to be for
>>>>> situations where we want to be able to "override" the security
>>>>> definition of an inner nested view. For users who want to
>>>>> do this, I would encourage them to just make a brand new definer view
>>>>> without referencing the "invoker" view.
>>>>>
>>>>> For example
>>>>>
>>>>> DEFINER => INVOKER => TABLE
>>>>>
>>>>> The "definer" should not be able to remove the "invoked" nature of
>>>>> access to the table. If a user really
>>>>> wants that behavior they should construct
>>>>>
>>>>> DEFINER (Combined with INVOKER SQL) => TABLE
>>>>>
>>>>> I'd rather we didn't encourage more complicated constructions
>>>>>
>>>>> On Mon, Feb 2, 2026 at 12:34 PM Prashant Singh <
>>>>> [email protected]> wrote:
>>>>>
>>>>>> Hi everyone,
>>>>>>
>>>>>> I’m currently working on passing additional context via the
>>>>>> referenced-by parameter in loadTable calls. This is a foundational
>>>>>> step toward enabling catalogs to make authorization decisions based on
>>>>>> query execution context.
>>>>>>
>>>>>> While the broader trust relationships and AuthZ constructs are
>>>>>> outside the scope of IRC, I’d like to align on the level of detail we
>>>>>> should provide. Specifically: *Should we send the entire view
>>>>>> reference chain, or only the immediate parent view on nested views?*
>>>>>>
>>>>>> The following are trade-offs:
>>>>>>
>>>>>>    -
>>>>>>
>>>>>>    *Full Chain:* Provides maximum flexibility for the server to make
>>>>>>    complex AuthZ decisions but increases client-side overhead for 
>>>>>> tracking
>>>>>>    nested references.
>>>>>>    -
>>>>>>
>>>>>>    *Immediate Parent:* Simpler for the client to implement but
>>>>>>    provides limited context for sophisticated authorization policies.
>>>>>>
>>>>>> *Prior Art & Research:* As noted in this discussion
>>>>>> <https://github.com/apache/iceberg/pull/13810#discussion_r2747121401>
>>>>>> (thanks Ryan and Russell), Postgres handles this via DEFINER (owner
>>>>>> permissions) and INVOKER (query permissions) without requiring
>>>>>> deeply nested context. My research into other engines hasn't yielded a
>>>>>> standard "gold level" approach yet, as some platforms simply restrict
>>>>>> nested view complexity.
>>>>>>
>>>>>> I’d love to hear your thoughts on which approach aligns better.
>>>>>>
>>>>>> Best regards,
>>>>>>
>>>>>> Prashant Singh
>>>>>>
>>>>>

Re: [DISCUSS][REST] Granularity of referenced-by context in loadTable calls

Reply via email to