Re: [DISCUSS][REST] Granularity of referenced-by context in loadTable calls

Russell Spitzer Wed, 04 Feb 2026 14:14:14 -0800

To me

Otherwise, B would not have been provided to the engine. Are there cases
where an engine might load B but not intend to allow access to the tables
it references?


This sounds like the definition of an invoker view. A user is able to load
the view definition, but the table load itself is on a per user basis so we
don't really have DEFINER behavior imho.

I honestly don't have strong feelings either way here, If we want to move
forward with the full chain that's fine with me since I feel like Catalogs
will get to make these decisions on what their particular permission
structures allow. Personally, I wouldn't want to give someone permission to
modify a view that is run-as another user if they don't have the
permissions as that user to access the underlying tables ;)

On Wed, Feb 4, 2026 at 3:49 PM Ryan Blue <[email protected]> wrote:

> The DEFINER view referenced by a DEFINER view is a good case to think
> about, but I don’t think that it requires the entire reference chain in
> order to be secure.
>
> Using the object names from Russell’s response, when view B is loaded and
> referenced-by is A, the catalog must trust that the engine is setting
> referenced-by correctly. It trusts that the engine will not lie and say
> that B is referenced from A instead of another view, and it trusts that
> projections, filters, etc. from A will be applied to data from B.
>
> I think the question here is whether the first guarantee, that A was
> loaded and referenced B, is sufficient when deciding whether the query
> has access to B and the tables it references. The catalog *could* assume
> that because B is the referenced-by for C from a trusted engine, that the
> query must have access to B. Otherwise, B would not have been provided to
> the engine. Are there cases where an engine might load B but not intend
> to allow access to the tables it references?
>
> I think there’s a fair argument that those cases exist. When tables or
> views are loaded, there’s no intent included. The catalog doesn’t know
> whether a view was loaded for a SHOW HISTORY command or because it is
> being updated or being run. So a view could be loaded because a user has
> some other permission, like MODIFY, but not SELECT. Or maybe a permission
> to audit the view but not see data. If the catalog allows those cases, then
> being able to load B doesn’t necessarily mean the query has access to the
> data that B produces. In that case, you would need to check the
> permissions that A has on B to determine whether to load/vend credentials
> for C.
>
> In writing this email, I think I’ve been convinced that Christian is
> correct and that it is best to keep the reference chain. Russell and
> Prashant, what do you think?
>
> Ryan
>
> On Wed, Feb 4, 2026 at 1:12 PM Russell Spitzer <[email protected]>
> wrote:
>
>> I understand the logging concern but not the correctness one.
>>
>> Are you saying we have to re-check to make sure nothing has changed since
>> we started?
>>
>> I would assume in this auth chain we could get by with a referenced_by in
>> the view request as well?
>> A  (View) => B (View) => C (Table)
>> LoadView(A)                                   gets the first view
>> LoadView(B, referenced_by A)       is for the second view using
>> "referenced_by" the first view
>> LoadTable(C, referenced_by B)      Finally we request the table using
>> referenced_by the second view
>>
>> Do we need the full chain in this case?
>>
>> I'm kind of convinced though by the logging argument since that would be
>> useful information to have, although I'm not
>> sure the Catalog couldn't piece this back together. It would definitely
>> be simpler to have it just always present.
>>
>> On Wed, Feb 4, 2026 at 2:34 PM Christian Thiel <
>> [email protected]> wrote:
>>
>>> Your assumption is correct—the 1st DEFINER view is authorized before the
>>> query engine retrieves its content and learns it references the 2nd DEFINER.
>>>
>>> Let me clarify the setup I had in mind: Query engines increasingly
>>> support passing user tokens to the catalog for authorization. Examples
>>> include Starburst's OAuth2 Token Passthrough [1] and StarRocks' JWT
>>> authentication [2].
>>>
>>> In such setups, the second request to the 2nd DEFINER view becomes
>>> problematic: the catalog receives a request from a user / invoker lacking
>>> direct access. Using the hypothetical "referenced-by" field—and assuming a
>>> trust relationship with the engine guaranteeing correctness—we must
>>> validate both:
>>>
>>> 1. The authorization decision for the 1st DEFINER still holds
>>> 2. The 1st DEFINER's owner has access to the 2nd
>>>
>>> While catalogs could issue short-lived authorization proof when
>>> returning the 1st DEFINER, re-authorizing is equally valid and arguably
>>> preferable, as the information is more current.
>>>
>>> Extending this to the TABLE level: we can either provide authorization
>>> proof with the 2nd DEFINER (presented when querying the TABLE), or
>>> re-authorize the entire chain.
>>>
>>> Without carrying client-side trust between requests, having the full
>>> (trusted) chain is the only way to authorize TABLE access (again requiring
>>> correctness guarantees through other trust mechanisms). Therefore,
>>> authorizing table access can only be seamlessly explained with the complete
>>> chain. Explicitly providing this information explicitly is preferable to
>>> reconstructing it from the TABLE metadata plus all prior authorization
>>> requests in my opinion - if only for audit logging.
>>>
>>> Does that make my thoughts clear?
>>>
>>> [1]
>>> https://docs.starburst.io/latest/object-storage/metastores.html#oauth-2-0-token-pass-through
>>> [2]
>>> https://docs.starrocks.io/docs/data_source/catalog/iceberg/iceberg_rest_security/#security-mechanisms
>>>
>>> Best,
>>>
>>> Christian
>>>
>>> On Wed, 4 Feb 2026 at 20:20, Prashant Singh <[email protected]>
>>> wrote:
>>>
>>>> Thank you for the feedback Christian !
>>>> I agree having full context could help in Audit purpose.
>>>>
>>>> Though, I am not able to fully understand your feedback from AuthZ pov
>>>> can you please elaborate ?
>>>> IIUC in your example 1st DEFINER => 2nd DEFINER => TABLE
>>>> user's access to 1st DEFINER view would have been Authorized before
>>>> the Query Engine could learn that 1st DEFINER references the 2nd DEFINER, i
>>>> am assuming it has a success in getting the view definition ? All it needs
>>>> to know when loading the table is what the view is referencing, when
>>>> it's authorizing the loadTable.
>>>>
>>>> regarding the referenced-by in the loadView thats a good
>>>> recommendation, let me think more
>>>>
>>>> Best,
>>>> Prashant Singh
>>>>
>>>>
>>>> On Tue, Feb 3, 2026 at 11:28 AM Christian Thiel <
>>>> [email protected]> wrote:
>>>>
>>>>> I prefer to keep the full chain.
>>>>>
>>>>> Consider this scenario:
>>>>> 1st DEFINER => 2nd DEFINER => TABLE
>>>>>
>>>>> When a user has access only to the outer view and the load table
>>>>> endpoint is called, the following authorizations conditions must be 
>>>>> ensured:
>>>>>
>>>>>    1. Owners of the DEFINER views still have access to their
>>>>>    referenced objects
>>>>>    2. The querying User has access to his entrypoint - the 1st
>>>>>    DEFINER View
>>>>>
>>>>> If the load table endpoint receives only the immediate parent in
>>>>> referenced-by, we lose critical information for check (2). This means
>>>>> the request data alone—even if trusted—is insufficient to make a complete
>>>>> authorization decision unless the server internally correlates the call to
>>>>> the 2nd DEFINER load with the load table request, as we can't trace it 
>>>>> back
>>>>> to the 1st DEFINER otherwise. To make this work consistently we would
>>>>> require referenced-by also for the load View endpoint.
>>>>>
>>>>> Additionally, knowing the user's entry point is valuable for auditing
>>>>> purposes, particularly in DEFINER-heavy implementations.
>>>>>
>>>>> I kind of disagree that postgres DEFINER views don't require deeply
>>>>> nested context.
>>>>>
>>>>> Postgres just handles this chain internally:
>>>>> 1. User is allowed to query 1st DEFINER
>>>>> 2. thus 2nd DEFINER may be used to respond to the query
>>>>> 3. thus TABLE maybe used to respond to the query
>>>>> But propagating this trust relationship in Icebeberg REST is more
>>>>> complex as objects are queried individually, so we can't just validate the
>>>>> full plan, but instead need to be able to validate access to each
>>>>> individual component it requires.
>>>>>
>>>>> Best,
>>>>> Christian
>>>>>
>>>>> On Mon, 2 Feb 2026 at 19:44, Russell Spitzer <
>>>>> [email protected]> wrote:
>>>>>
>>>>>> Just to re-up my comments from the discussion.
>>>>>>
>>>>>> I'm in favor of Immediate Parent only. Full chain seems to be for
>>>>>> situations where we want to be able to "override" the security
>>>>>> definition of an inner nested view. For users who want to
>>>>>> do this, I would encourage them to just make a brand new definer view
>>>>>> without referencing the "invoker" view.
>>>>>>
>>>>>> For example
>>>>>>
>>>>>> DEFINER => INVOKER => TABLE
>>>>>>
>>>>>> The "definer" should not be able to remove the "invoked" nature of
>>>>>> access to the table. If a user really
>>>>>> wants that behavior they should construct
>>>>>>
>>>>>> DEFINER (Combined with INVOKER SQL) => TABLE
>>>>>>
>>>>>> I'd rather we didn't encourage more complicated constructions
>>>>>>
>>>>>> On Mon, Feb 2, 2026 at 12:34 PM Prashant Singh <
>>>>>> [email protected]> wrote:
>>>>>>
>>>>>>> Hi everyone,
>>>>>>>
>>>>>>> I’m currently working on passing additional context via the
>>>>>>> referenced-by parameter in loadTable calls. This is a foundational
>>>>>>> step toward enabling catalogs to make authorization decisions based on
>>>>>>> query execution context.
>>>>>>>
>>>>>>> While the broader trust relationships and AuthZ constructs are
>>>>>>> outside the scope of IRC, I’d like to align on the level of detail we
>>>>>>> should provide. Specifically: *Should we send the entire view
>>>>>>> reference chain, or only the immediate parent view on nested views?*
>>>>>>>
>>>>>>> The following are trade-offs:
>>>>>>>
>>>>>>>    -
>>>>>>>
>>>>>>>    *Full Chain:* Provides maximum flexibility for the server to
>>>>>>>    make complex AuthZ decisions but increases client-side overhead for
>>>>>>>    tracking nested references.
>>>>>>>    -
>>>>>>>
>>>>>>>    *Immediate Parent:* Simpler for the client to implement but
>>>>>>>    provides limited context for sophisticated authorization policies.
>>>>>>>
>>>>>>> *Prior Art & Research:* As noted in this discussion
>>>>>>> <https://github.com/apache/iceberg/pull/13810#discussion_r2747121401>
>>>>>>> (thanks Ryan and Russell), Postgres handles this via DEFINER (owner
>>>>>>> permissions) and INVOKER (query permissions) without requiring
>>>>>>> deeply nested context. My research into other engines hasn't yielded a
>>>>>>> standard "gold level" approach yet, as some platforms simply restrict
>>>>>>> nested view complexity.
>>>>>>>
>>>>>>> I’d love to hear your thoughts on which approach aligns better.
>>>>>>>
>>>>>>> Best regards,
>>>>>>>
>>>>>>> Prashant Singh
>>>>>>>
>>>>>>

Re: [DISCUSS][REST] Granularity of referenced-by context in loadTable calls

Reply via email to