Re: [DISCUSS][REST] Granularity of referenced-by context in loadTable calls

Prashant Singh Fri, 06 Feb 2026 14:27:36 -0800

Thank you for the feedback everyone ! I believe I also agree here that we
don't need the entire reference chain to be secure.


Though I can totally understand how having the whole reference chain in the
catalog can be helpful in  AuthZ as sometimes they can be very complex
based security models / guarantees catalog provides , I believe this is
where Christian and Ryan suggestion is, I feel like we should send complete
reference chains from the client to the server to support these use cases.

I checked offline with Russell too (and he kind of hinted in the above
message that he doesn't have strong feelings either way) we are good !

 I believe we have consensus here in the thread to keep the complete chain
! It would be nice to advance with the voting !

Best,
Prashant Singh

On Wed, Feb 4, 2026 at 2:14 PM Russell Spitzer <[email protected]>
wrote:

> To me
>
> Otherwise, B would not have been provided to the engine. Are there cases
> where an engine might load B but not intend to allow access to the tables
> it references?
>
> This sounds like the definition of an invoker view. A user is able to load
> the view definition, but the table load itself is on a per user basis so we
> don't really have DEFINER behavior imho.
>
> I honestly don't have strong feelings either way here, If we want to move
> forward with the full chain that's fine with me since I feel like Catalogs
> will get to make these decisions on what their particular permission
> structures allow. Personally, I wouldn't want to give someone permission to
> modify a view that is run-as another user if they don't have the
> permissions as that user to access the underlying tables ;)
>
> On Wed, Feb 4, 2026 at 3:49 PM Ryan Blue <[email protected]> wrote:
>
>> The DEFINER view referenced by a DEFINER view is a good case to think
>> about, but I don’t think that it requires the entire reference chain in
>> order to be secure.
>>
>> Using the object names from Russell’s response, when view B is loaded
>> and referenced-by is A, the catalog must trust that the engine is
>> setting referenced-by correctly. It trusts that the engine will not lie
>> and say that B is referenced from A instead of another view, and it
>> trusts that projections, filters, etc. from A will be applied to data
>> from B.
>>
>> I think the question here is whether the first guarantee, that A was
>> loaded and referenced B, is sufficient when deciding whether the query
>> has access to B and the tables it references. The catalog *could* assume
>> that because B is the referenced-by for C from a trusted engine, that
>> the query must have access to B. Otherwise, B would not have been
>> provided to the engine. Are there cases where an engine might load B but
>> not intend to allow access to the tables it references?
>>
>> I think there’s a fair argument that those cases exist. When tables or
>> views are loaded, there’s no intent included. The catalog doesn’t know
>> whether a view was loaded for a SHOW HISTORY command or because it is
>> being updated or being run. So a view could be loaded because a user has
>> some other permission, like MODIFY, but not SELECT. Or maybe a
>> permission to audit the view but not see data. If the catalog allows those
>> cases, then being able to load B doesn’t necessarily mean the query has
>> access to the data that B produces. In that case, you would need to
>> check the permissions that A has on B to determine whether to load/vend
>> credentials for C.
>>
>> In writing this email, I think I’ve been convinced that Christian is
>> correct and that it is best to keep the reference chain. Russell and
>> Prashant, what do you think?
>>
>> Ryan
>>
>> On Wed, Feb 4, 2026 at 1:12 PM Russell Spitzer <[email protected]>
>> wrote:
>>
>>> I understand the logging concern but not the correctness one.
>>>
>>> Are you saying we have to re-check to make sure nothing has changed
>>> since we started?
>>>
>>> I would assume in this auth chain we could get by with a referenced_by
>>> in the view request as well?
>>> A  (View) => B (View) => C (Table)
>>> LoadView(A)                                   gets the first view
>>> LoadView(B, referenced_by A)       is for the second view using
>>> "referenced_by" the first view
>>> LoadTable(C, referenced_by B)      Finally we request the table using
>>> referenced_by the second view
>>>
>>> Do we need the full chain in this case?
>>>
>>> I'm kind of convinced though by the logging argument since that would be
>>> useful information to have, although I'm not
>>> sure the Catalog couldn't piece this back together. It would definitely
>>> be simpler to have it just always present.
>>>
>>> On Wed, Feb 4, 2026 at 2:34 PM Christian Thiel <
>>> [email protected]> wrote:
>>>
>>>> Your assumption is correct—the 1st DEFINER view is authorized before
>>>> the query engine retrieves its content and learns it references the 2nd
>>>> DEFINER.
>>>>
>>>> Let me clarify the setup I had in mind: Query engines increasingly
>>>> support passing user tokens to the catalog for authorization. Examples
>>>> include Starburst's OAuth2 Token Passthrough [1] and StarRocks' JWT
>>>> authentication [2].
>>>>
>>>> In such setups, the second request to the 2nd DEFINER view becomes
>>>> problematic: the catalog receives a request from a user / invoker lacking
>>>> direct access. Using the hypothetical "referenced-by" field—and assuming a
>>>> trust relationship with the engine guaranteeing correctness—we must
>>>> validate both:
>>>>
>>>> 1. The authorization decision for the 1st DEFINER still holds
>>>> 2. The 1st DEFINER's owner has access to the 2nd
>>>>
>>>> While catalogs could issue short-lived authorization proof when
>>>> returning the 1st DEFINER, re-authorizing is equally valid and arguably
>>>> preferable, as the information is more current.
>>>>
>>>> Extending this to the TABLE level: we can either provide authorization
>>>> proof with the 2nd DEFINER (presented when querying the TABLE), or
>>>> re-authorize the entire chain.
>>>>
>>>> Without carrying client-side trust between requests, having the full
>>>> (trusted) chain is the only way to authorize TABLE access (again requiring
>>>> correctness guarantees through other trust mechanisms). Therefore,
>>>> authorizing table access can only be seamlessly explained with the complete
>>>> chain. Explicitly providing this information explicitly is preferable to
>>>> reconstructing it from the TABLE metadata plus all prior authorization
>>>> requests in my opinion - if only for audit logging.
>>>>
>>>> Does that make my thoughts clear?
>>>>
>>>> [1]
>>>> https://docs.starburst.io/latest/object-storage/metastores.html#oauth-2-0-token-pass-through
>>>> [2]
>>>> https://docs.starrocks.io/docs/data_source/catalog/iceberg/iceberg_rest_security/#security-mechanisms
>>>>
>>>> Best,
>>>>
>>>> Christian
>>>>
>>>> On Wed, 4 Feb 2026 at 20:20, Prashant Singh <[email protected]>
>>>> wrote:
>>>>
>>>>> Thank you for the feedback Christian !
>>>>> I agree having full context could help in Audit purpose.
>>>>>
>>>>> Though, I am not able to fully understand your feedback from AuthZ pov
>>>>> can you please elaborate ?
>>>>> IIUC in your example 1st DEFINER => 2nd DEFINER => TABLE
>>>>> user's access to 1st DEFINER view would have been Authorized before
>>>>> the Query Engine could learn that 1st DEFINER references the 2nd DEFINER, 
>>>>> i
>>>>> am assuming it has a success in getting the view definition ? All it needs
>>>>> to know when loading the table is what the view is referencing, when
>>>>> it's authorizing the loadTable.
>>>>>
>>>>> regarding the referenced-by in the loadView thats a good
>>>>> recommendation, let me think more
>>>>>
>>>>> Best,
>>>>> Prashant Singh
>>>>>
>>>>>
>>>>> On Tue, Feb 3, 2026 at 11:28 AM Christian Thiel <
>>>>> [email protected]> wrote:
>>>>>
>>>>>> I prefer to keep the full chain.
>>>>>>
>>>>>> Consider this scenario:
>>>>>> 1st DEFINER => 2nd DEFINER => TABLE
>>>>>>
>>>>>> When a user has access only to the outer view and the load table
>>>>>> endpoint is called, the following authorizations conditions must be 
>>>>>> ensured:
>>>>>>
>>>>>>    1. Owners of the DEFINER views still have access to their
>>>>>>    referenced objects
>>>>>>    2. The querying User has access to his entrypoint - the 1st
>>>>>>    DEFINER View
>>>>>>
>>>>>> If the load table endpoint receives only the immediate parent in
>>>>>> referenced-by, we lose critical information for check (2). This
>>>>>> means the request data alone—even if trusted—is insufficient to make a
>>>>>> complete authorization decision unless the server internally correlates 
>>>>>> the
>>>>>> call to the 2nd DEFINER load with the load table request, as we can't 
>>>>>> trace
>>>>>> it back to the 1st DEFINER otherwise. To make this work consistently we
>>>>>> would require referenced-by also for the load View endpoint.
>>>>>>
>>>>>> Additionally, knowing the user's entry point is valuable for auditing
>>>>>> purposes, particularly in DEFINER-heavy implementations.
>>>>>>
>>>>>> I kind of disagree that postgres DEFINER views don't require deeply
>>>>>> nested context.
>>>>>>
>>>>>> Postgres just handles this chain internally:
>>>>>> 1. User is allowed to query 1st DEFINER
>>>>>> 2. thus 2nd DEFINER may be used to respond to the query
>>>>>> 3. thus TABLE maybe used to respond to the query
>>>>>> But propagating this trust relationship in Icebeberg REST is more
>>>>>> complex as objects are queried individually, so we can't just validate 
>>>>>> the
>>>>>> full plan, but instead need to be able to validate access to each
>>>>>> individual component it requires.
>>>>>>
>>>>>> Best,
>>>>>> Christian
>>>>>>
>>>>>> On Mon, 2 Feb 2026 at 19:44, Russell Spitzer <
>>>>>> [email protected]> wrote:
>>>>>>
>>>>>>> Just to re-up my comments from the discussion.
>>>>>>>
>>>>>>> I'm in favor of Immediate Parent only. Full chain seems to be for
>>>>>>> situations where we want to be able to "override" the security
>>>>>>> definition of an inner nested view. For users who want to
>>>>>>> do this, I would encourage them to just make a brand new definer
>>>>>>> view without referencing the "invoker" view.
>>>>>>>
>>>>>>> For example
>>>>>>>
>>>>>>> DEFINER => INVOKER => TABLE
>>>>>>>
>>>>>>> The "definer" should not be able to remove the "invoked" nature of
>>>>>>> access to the table. If a user really
>>>>>>> wants that behavior they should construct
>>>>>>>
>>>>>>> DEFINER (Combined with INVOKER SQL) => TABLE
>>>>>>>
>>>>>>> I'd rather we didn't encourage more complicated constructions
>>>>>>>
>>>>>>> On Mon, Feb 2, 2026 at 12:34 PM Prashant Singh <
>>>>>>> [email protected]> wrote:
>>>>>>>
>>>>>>>> Hi everyone,
>>>>>>>>
>>>>>>>> I’m currently working on passing additional context via the
>>>>>>>> referenced-by parameter in loadTable calls. This is a foundational
>>>>>>>> step toward enabling catalogs to make authorization decisions based on
>>>>>>>> query execution context.
>>>>>>>>
>>>>>>>> While the broader trust relationships and AuthZ constructs are
>>>>>>>> outside the scope of IRC, I’d like to align on the level of detail we
>>>>>>>> should provide. Specifically: *Should we send the entire view
>>>>>>>> reference chain, or only the immediate parent view on nested views?*
>>>>>>>>
>>>>>>>> The following are trade-offs:
>>>>>>>>
>>>>>>>>    -
>>>>>>>>
>>>>>>>>    *Full Chain:* Provides maximum flexibility for the server to
>>>>>>>>    make complex AuthZ decisions but increases client-side overhead for
>>>>>>>>    tracking nested references.
>>>>>>>>    -
>>>>>>>>
>>>>>>>>    *Immediate Parent:* Simpler for the client to implement but
>>>>>>>>    provides limited context for sophisticated authorization policies.
>>>>>>>>
>>>>>>>> *Prior Art & Research:* As noted in this discussion
>>>>>>>> <https://github.com/apache/iceberg/pull/13810#discussion_r2747121401>
>>>>>>>> (thanks Ryan and Russell), Postgres handles this via DEFINER
>>>>>>>> (owner permissions) and INVOKER (query permissions) without
>>>>>>>> requiring deeply nested context. My research into other engines hasn't
>>>>>>>> yielded a standard "gold level" approach yet, as some platforms simply
>>>>>>>> restrict nested view complexity.
>>>>>>>>
>>>>>>>> I’d love to hear your thoughts on which approach aligns better.
>>>>>>>>
>>>>>>>> Best regards,
>>>>>>>>
>>>>>>>> Prashant Singh
>>>>>>>>
>>>>>>>

Re: [DISCUSS][REST] Granularity of referenced-by context in loadTable calls

Reply via email to