Hey JB, Thank you so much for the feedback, I would like to convince you, as to what my thought process is, when i propose this :
> not do query engine work, but more interact with any query engines for ex: TMS I agree with this in principle, and we should specially not involve any compute (for ex getting the orphan files, deleting etc) in the same JVM as that of the catalog. but the intent here in this proposal is complementing the same, we are trying to avoid what engines are trying to do what catalog should be doing instead i.e resolving identity, authn and authZ if we look this as way to authorize then this is something we can let the catalog do and make engine just do query execution and not identity resolution. > not be opinionated on SQL dialect Definitely, we don't want to be opinionated bases on SQL dialect, as much as possible, but IMHO at-least for the case like this, where we want catalog to resolve identity, we can accommodate considering the value it brings it gives catalog more authority towards identity resolution which is a very big problem and if we scatter this around the engines we might lose that control as catalog, for ex consider this is_principal_role('ANALYST') and i leave this to engine what SQL will they evaluate ? they need to come to catalog anyways saying that hey is my authenticated principal_role amongst ANALYST ? There is no guarantee they will and even if we do, such a contract where we want a catalog to evaluate a function can take ages to get it due to scattered nature. > I agree that Polaris would need to do "enforcement" but supporting any query engines/SQL dialect is very difficult Definitely agree hence even if this means restricting this to very narrow as `WHERE .....( clause)` is required, I am fine to do, just want to imbibe as much enforcement as possible. > I think we should explore "abstraction" like Substrait or Coral to be agnostic Definitely, but I think since the view itself doesn't have an IR, this is not something that should be easily achievable, but I totally see where you are coming from. I think even more fundamental is who owns SQL to IR conversion nevertheless can all engines directly read from IR. I agree that we need a clear boundary between engine and catalog and this is where i am coming from as well, AuthZ just can't be an engine only when things like identity is involved, we need to do this at catalog level to have uniform enforcement. Please let me know your further thoughts. Best, Prashant Singh On Mon, May 19, 2025 at 12:21 PM Jean-Baptiste Onofré <j...@nanthrax.net> wrote: > Hi Prashant > > Thanks for the proposal. > > I understand the purpose (about FGAC which is something we plan to > work on), but I'm not sure if it's a good approach with this kind of > SQL functions. > Polaris, as a catalog, should: > 1. not do query engine work, but more interact with any query engines > (same discussion we had about TMS) > 2. not be opinionated on SQL dialect > > I agree that Polaris would need to do "enforcement" but supporting any > query engines/SQL dialect is very difficult. I think we should explore > "abstraction" like Substrait or Coral to be agnostic. > I think Polaris should "integrate" query engines, with a clear > boundary between what's query engine and catalog responsibility. > > I think the proposal has great value, but I'm not yet convinced by the > impl approach. > > Regards > JB > > On Mon, May 19, 2025 at 7:26 PM Prashant Singh > <prashant.si...@snowflake.com.invalid> wrote: > > > > Hi everyone, > > > > I’d like to propose adding *context-aware functions* to Apache Polaris so > > that view definitions can resolve security context on the Polaris side > (aka > > catalog end without depending on engines). > > > > *Proposed functions* > > > > 1. > > > > *is_principal('<principal_name>')* – returns TRUE if the authenticated > > principal matches <principal_name>, otherwise FALSE. > > 2. > > > > *is_principal_role('<principal_role_name>')* – returns TRUE when > > <principal_role_name> appears in the principal’s role set. > > 3. > > > > *is_catalog_role('<catalog_role_name>')* – analogous check at the > > catalog-role level. > > > > *Why it matters* > > > > These predicates make views dynamic. Example: > > > > CREATE VIEW dynamic_vw ASSELECT *FROM ns1.layer1_tableWHERE > > is_principal_role('ANALYST'); > > > > When a user whose one of principal roles include *ANALYST* calls LOAD > > VIEW, Polaris rewrites the view to > > > > > > - > > > > SELECT * FROM ns1.layer1_table WHERE TRUE; > > > > > > For everyone else the view becomes > > > > - > > > > SELECT * FROM ns1.layer1_table WHERE FALSE; > > > > > > The result is better and consistent control of the identity resolution > > without relying on the engine side changes and giving polaris more > > authority in enforcing things like FGAC (WIP by me). > > Note the same can be extrapolated to any Polaris stored entity. > > > > *Proof of concept* > > > > I’ve put together a quick POC branch: > > > https://github.com/apache/polaris/compare/main...singhpk234:polaris:dyanmic/view > > > > *Prior art* > > > > Snowflake context functions : > > https://docs.snowflake.com/en/sql-reference/functions-context > > <https://docs.snowflake.com/en/sql-reference/functions-context> > > Databricks Unity Catalog offers a similar mechanism called *dynamic > views*: > > https://docs.databricks.com/aws/en/views/dynamic > > > > *Next steps* > > > > If the community is interested, we can discuss API surface, engine > > implications, and a roadmap for merging. > > > > Eager to hear your feedback! > > > > Best, > > Prashant Singh >