Thank you for the feedback everyone ! I believe I also agree here that we don't need the entire reference chain to be secure.
Though I can totally understand how having the whole reference chain in the catalog can be helpful in AuthZ as sometimes they can be very complex based security models / guarantees catalog provides , I believe this is where Christian and Ryan suggestion is, I feel like we should send complete reference chains from the client to the server to support these use cases. I checked offline with Russell too (and he kind of hinted in the above message that he doesn't have strong feelings either way) we are good ! I believe we have consensus here in the thread to keep the complete chain ! It would be nice to advance with the voting ! Best, Prashant Singh On Wed, Feb 4, 2026 at 2:14 PM Russell Spitzer <[email protected]> wrote: > To me > > Otherwise, B would not have been provided to the engine. Are there cases > where an engine might load B but not intend to allow access to the tables > it references? > > This sounds like the definition of an invoker view. A user is able to load > the view definition, but the table load itself is on a per user basis so we > don't really have DEFINER behavior imho. > > I honestly don't have strong feelings either way here, If we want to move > forward with the full chain that's fine with me since I feel like Catalogs > will get to make these decisions on what their particular permission > structures allow. Personally, I wouldn't want to give someone permission to > modify a view that is run-as another user if they don't have the > permissions as that user to access the underlying tables ;) > > On Wed, Feb 4, 2026 at 3:49 PM Ryan Blue <[email protected]> wrote: > >> The DEFINER view referenced by a DEFINER view is a good case to think >> about, but I don’t think that it requires the entire reference chain in >> order to be secure. >> >> Using the object names from Russell’s response, when view B is loaded >> and referenced-by is A, the catalog must trust that the engine is >> setting referenced-by correctly. It trusts that the engine will not lie >> and say that B is referenced from A instead of another view, and it >> trusts that projections, filters, etc. from A will be applied to data >> from B. >> >> I think the question here is whether the first guarantee, that A was >> loaded and referenced B, is sufficient when deciding whether the query >> has access to B and the tables it references. The catalog *could* assume >> that because B is the referenced-by for C from a trusted engine, that >> the query must have access to B. Otherwise, B would not have been >> provided to the engine. Are there cases where an engine might load B but >> not intend to allow access to the tables it references? >> >> I think there’s a fair argument that those cases exist. When tables or >> views are loaded, there’s no intent included. The catalog doesn’t know >> whether a view was loaded for a SHOW HISTORY command or because it is >> being updated or being run. So a view could be loaded because a user has >> some other permission, like MODIFY, but not SELECT. Or maybe a >> permission to audit the view but not see data. If the catalog allows those >> cases, then being able to load B doesn’t necessarily mean the query has >> access to the data that B produces. In that case, you would need to >> check the permissions that A has on B to determine whether to load/vend >> credentials for C. >> >> In writing this email, I think I’ve been convinced that Christian is >> correct and that it is best to keep the reference chain. Russell and >> Prashant, what do you think? >> >> Ryan >> >> On Wed, Feb 4, 2026 at 1:12 PM Russell Spitzer <[email protected]> >> wrote: >> >>> I understand the logging concern but not the correctness one. >>> >>> Are you saying we have to re-check to make sure nothing has changed >>> since we started? >>> >>> I would assume in this auth chain we could get by with a referenced_by >>> in the view request as well? >>> A (View) => B (View) => C (Table) >>> LoadView(A) gets the first view >>> LoadView(B, referenced_by A) is for the second view using >>> "referenced_by" the first view >>> LoadTable(C, referenced_by B) Finally we request the table using >>> referenced_by the second view >>> >>> Do we need the full chain in this case? >>> >>> I'm kind of convinced though by the logging argument since that would be >>> useful information to have, although I'm not >>> sure the Catalog couldn't piece this back together. It would definitely >>> be simpler to have it just always present. >>> >>> On Wed, Feb 4, 2026 at 2:34 PM Christian Thiel < >>> [email protected]> wrote: >>> >>>> Your assumption is correct—the 1st DEFINER view is authorized before >>>> the query engine retrieves its content and learns it references the 2nd >>>> DEFINER. >>>> >>>> Let me clarify the setup I had in mind: Query engines increasingly >>>> support passing user tokens to the catalog for authorization. Examples >>>> include Starburst's OAuth2 Token Passthrough [1] and StarRocks' JWT >>>> authentication [2]. >>>> >>>> In such setups, the second request to the 2nd DEFINER view becomes >>>> problematic: the catalog receives a request from a user / invoker lacking >>>> direct access. Using the hypothetical "referenced-by" field—and assuming a >>>> trust relationship with the engine guaranteeing correctness—we must >>>> validate both: >>>> >>>> 1. The authorization decision for the 1st DEFINER still holds >>>> 2. The 1st DEFINER's owner has access to the 2nd >>>> >>>> While catalogs could issue short-lived authorization proof when >>>> returning the 1st DEFINER, re-authorizing is equally valid and arguably >>>> preferable, as the information is more current. >>>> >>>> Extending this to the TABLE level: we can either provide authorization >>>> proof with the 2nd DEFINER (presented when querying the TABLE), or >>>> re-authorize the entire chain. >>>> >>>> Without carrying client-side trust between requests, having the full >>>> (trusted) chain is the only way to authorize TABLE access (again requiring >>>> correctness guarantees through other trust mechanisms). Therefore, >>>> authorizing table access can only be seamlessly explained with the complete >>>> chain. Explicitly providing this information explicitly is preferable to >>>> reconstructing it from the TABLE metadata plus all prior authorization >>>> requests in my opinion - if only for audit logging. >>>> >>>> Does that make my thoughts clear? >>>> >>>> [1] >>>> https://docs.starburst.io/latest/object-storage/metastores.html#oauth-2-0-token-pass-through >>>> [2] >>>> https://docs.starrocks.io/docs/data_source/catalog/iceberg/iceberg_rest_security/#security-mechanisms >>>> >>>> Best, >>>> >>>> Christian >>>> >>>> On Wed, 4 Feb 2026 at 20:20, Prashant Singh <[email protected]> >>>> wrote: >>>> >>>>> Thank you for the feedback Christian ! >>>>> I agree having full context could help in Audit purpose. >>>>> >>>>> Though, I am not able to fully understand your feedback from AuthZ pov >>>>> can you please elaborate ? >>>>> IIUC in your example 1st DEFINER => 2nd DEFINER => TABLE >>>>> user's access to 1st DEFINER view would have been Authorized before >>>>> the Query Engine could learn that 1st DEFINER references the 2nd DEFINER, >>>>> i >>>>> am assuming it has a success in getting the view definition ? All it needs >>>>> to know when loading the table is what the view is referencing, when >>>>> it's authorizing the loadTable. >>>>> >>>>> regarding the referenced-by in the loadView thats a good >>>>> recommendation, let me think more >>>>> >>>>> Best, >>>>> Prashant Singh >>>>> >>>>> >>>>> On Tue, Feb 3, 2026 at 11:28 AM Christian Thiel < >>>>> [email protected]> wrote: >>>>> >>>>>> I prefer to keep the full chain. >>>>>> >>>>>> Consider this scenario: >>>>>> 1st DEFINER => 2nd DEFINER => TABLE >>>>>> >>>>>> When a user has access only to the outer view and the load table >>>>>> endpoint is called, the following authorizations conditions must be >>>>>> ensured: >>>>>> >>>>>> 1. Owners of the DEFINER views still have access to their >>>>>> referenced objects >>>>>> 2. The querying User has access to his entrypoint - the 1st >>>>>> DEFINER View >>>>>> >>>>>> If the load table endpoint receives only the immediate parent in >>>>>> referenced-by, we lose critical information for check (2). This >>>>>> means the request data alone—even if trusted—is insufficient to make a >>>>>> complete authorization decision unless the server internally correlates >>>>>> the >>>>>> call to the 2nd DEFINER load with the load table request, as we can't >>>>>> trace >>>>>> it back to the 1st DEFINER otherwise. To make this work consistently we >>>>>> would require referenced-by also for the load View endpoint. >>>>>> >>>>>> Additionally, knowing the user's entry point is valuable for auditing >>>>>> purposes, particularly in DEFINER-heavy implementations. >>>>>> >>>>>> I kind of disagree that postgres DEFINER views don't require deeply >>>>>> nested context. >>>>>> >>>>>> Postgres just handles this chain internally: >>>>>> 1. User is allowed to query 1st DEFINER >>>>>> 2. thus 2nd DEFINER may be used to respond to the query >>>>>> 3. thus TABLE maybe used to respond to the query >>>>>> But propagating this trust relationship in Icebeberg REST is more >>>>>> complex as objects are queried individually, so we can't just validate >>>>>> the >>>>>> full plan, but instead need to be able to validate access to each >>>>>> individual component it requires. >>>>>> >>>>>> Best, >>>>>> Christian >>>>>> >>>>>> On Mon, 2 Feb 2026 at 19:44, Russell Spitzer < >>>>>> [email protected]> wrote: >>>>>> >>>>>>> Just to re-up my comments from the discussion. >>>>>>> >>>>>>> I'm in favor of Immediate Parent only. Full chain seems to be for >>>>>>> situations where we want to be able to "override" the security >>>>>>> definition of an inner nested view. For users who want to >>>>>>> do this, I would encourage them to just make a brand new definer >>>>>>> view without referencing the "invoker" view. >>>>>>> >>>>>>> For example >>>>>>> >>>>>>> DEFINER => INVOKER => TABLE >>>>>>> >>>>>>> The "definer" should not be able to remove the "invoked" nature of >>>>>>> access to the table. If a user really >>>>>>> wants that behavior they should construct >>>>>>> >>>>>>> DEFINER (Combined with INVOKER SQL) => TABLE >>>>>>> >>>>>>> I'd rather we didn't encourage more complicated constructions >>>>>>> >>>>>>> On Mon, Feb 2, 2026 at 12:34 PM Prashant Singh < >>>>>>> [email protected]> wrote: >>>>>>> >>>>>>>> Hi everyone, >>>>>>>> >>>>>>>> I’m currently working on passing additional context via the >>>>>>>> referenced-by parameter in loadTable calls. This is a foundational >>>>>>>> step toward enabling catalogs to make authorization decisions based on >>>>>>>> query execution context. >>>>>>>> >>>>>>>> While the broader trust relationships and AuthZ constructs are >>>>>>>> outside the scope of IRC, I’d like to align on the level of detail we >>>>>>>> should provide. Specifically: *Should we send the entire view >>>>>>>> reference chain, or only the immediate parent view on nested views?* >>>>>>>> >>>>>>>> The following are trade-offs: >>>>>>>> >>>>>>>> - >>>>>>>> >>>>>>>> *Full Chain:* Provides maximum flexibility for the server to >>>>>>>> make complex AuthZ decisions but increases client-side overhead for >>>>>>>> tracking nested references. >>>>>>>> - >>>>>>>> >>>>>>>> *Immediate Parent:* Simpler for the client to implement but >>>>>>>> provides limited context for sophisticated authorization policies. >>>>>>>> >>>>>>>> *Prior Art & Research:* As noted in this discussion >>>>>>>> <https://github.com/apache/iceberg/pull/13810#discussion_r2747121401> >>>>>>>> (thanks Ryan and Russell), Postgres handles this via DEFINER >>>>>>>> (owner permissions) and INVOKER (query permissions) without >>>>>>>> requiring deeply nested context. My research into other engines hasn't >>>>>>>> yielded a standard "gold level" approach yet, as some platforms simply >>>>>>>> restrict nested view complexity. >>>>>>>> >>>>>>>> I’d love to hear your thoughts on which approach aligns better. >>>>>>>> >>>>>>>> Best regards, >>>>>>>> >>>>>>>> Prashant Singh >>>>>>>> >>>>>>>
