Re: Context-Aware Functions for Apache Polaris

Prashant Singh Mon, 19 May 2025 14:12:01 -0700

Hey JB,

Thank you so much for the feedback, I would like to convince you, as to
what my thought process is, when i propose this :

>  not do query engine work, but more interact with any query engines for
ex: TMS

I agree with this in principle, and we should specially not involve any
compute (for ex getting the orphan files, deleting etc) in the same JVM as
that of the catalog.

but the intent here in this proposal is complementing the same, we are
trying to avoid what engines are trying to do what catalog should be doing
instead i.e resolving identity, authn and authZ
if we look this as way to authorize then this is something we can let the
catalog do and make engine just do query execution and not identity
resolution.

> not be opinionated on SQL dialect

Definitely, we don't want to be opinionated bases on SQL dialect, as much
as possible, but IMHO at-least for the case like this, where we want
catalog to resolve identity,
 we can accommodate considering the value it brings it gives catalog more
authority towards identity resolution which is a very big problem and if we
scatter this around the engines we might lose that control as catalog,
 for ex consider this is_principal_role('ANALYST') and i leave this to
engine what SQL will they evaluate ? they need to come to catalog anyways
saying that hey is my authenticated principal_role amongst
ANALYST ? There is no guarantee they will and even if we do, such a
contract where we want a catalog to evaluate a function can take ages to
get it due to scattered nature.

> I agree that Polaris would need to do "enforcement" but supporting any
query engines/SQL dialect is very difficult

Definitely agree hence even if this means restricting this to very narrow
as `WHERE .....( clause)` is required, I am fine to do, just want to imbibe
as much enforcement as possible.

>  I think we should explore "abstraction" like Substrait or Coral to be
agnostic

Definitely, but I think since the view itself doesn't have an IR, this is
not something that should be easily achievable, but I totally see where you
are coming from. I think even more fundamental is
who owns SQL to IR conversion nevertheless can all engines directly read
from IR.

I agree that we need a clear boundary between engine and catalog and this
is where i am coming from as well, AuthZ just can't be an engine only when
things like identity is involved, we need to do this at catalog level to
have uniform enforcement.

 Please let me know your further thoughts.

Best,
Prashant Singh

On Mon, May 19, 2025 at 12:21 PM Jean-Baptiste Onofré <[email protected]>
wrote:

> Hi Prashant
>
> Thanks for the proposal.
>
> I understand the purpose (about FGAC which is something we plan to
> work on), but I'm not sure if it's a good approach with this kind of
> SQL functions.
> Polaris, as a catalog, should:
> 1. not do query engine work, but more interact with any query engines
> (same discussion we had about TMS)
> 2. not be opinionated on SQL dialect
>
> I agree that Polaris would need to do "enforcement" but supporting any
> query engines/SQL dialect is very difficult. I think we should explore
> "abstraction" like Substrait or Coral to be agnostic.
> I think Polaris should "integrate" query engines, with a clear
> boundary between what's query engine and catalog responsibility.
>
> I think the proposal has great value, but I'm not yet convinced by the
> impl approach.
>
> Regards
> JB
>
> On Mon, May 19, 2025 at 7:26 PM Prashant Singh
> <[email protected]> wrote:
> >
> > Hi everyone,
> >
> > I’d like to propose adding *context-aware functions* to Apache Polaris so
> > that view definitions can resolve security context on the Polaris side
> (aka
> > catalog end without depending on engines).
> >
> > *Proposed functions*
> >
> >    1.
> >
> >    *is_principal('<principal_name>')* – returns TRUE if the authenticated
> >    principal matches <principal_name>, otherwise FALSE.
> >    2.
> >
> >    *is_principal_role('<principal_role_name>')* – returns TRUE when
> >    <principal_role_name> appears in the principal’s role set.
> >    3.
> >
> >    *is_catalog_role('<catalog_role_name>')* – analogous check at the
> >    catalog-role level.
> >
> > *Why it matters*
> >
> > These predicates make views dynamic. Example:
> >
> > CREATE VIEW dynamic_vw ASSELECT *FROM ns1.layer1_tableWHERE
> > is_principal_role('ANALYST');
> >
> > When a user whose one of principal roles include *ANALYST* calls LOAD
> > VIEW, Polaris rewrites the view to
> >
> >
> >    -
> >
> >    SELECT * FROM ns1.layer1_table WHERE TRUE;
> >
> >
> > For everyone else the view becomes
> >
> >    -
> >
> >    SELECT * FROM ns1.layer1_table WHERE FALSE;
> >
> >
> > The result is better and consistent control of the identity resolution
> > without relying on the engine side changes and giving polaris more
> > authority in enforcing things like FGAC (WIP by me).
> > Note the same can be extrapolated to any Polaris stored entity.
> >
> > *Proof of concept*
> >
> > I’ve put together a quick POC branch:
> >
> https://github.com/apache/polaris/compare/main...singhpk234:polaris:dyanmic/view
> >
> > *Prior art*
> >
> > Snowflake context functions :
> >  https://docs.snowflake.com/en/sql-reference/functions-context
> > <https://docs.snowflake.com/en/sql-reference/functions-context>
> > Databricks Unity Catalog offers a similar mechanism called *dynamic
> views*:
> > https://docs.databricks.com/aws/en/views/dynamic
> >
> > *Next steps*
> >
> > If the community is interested, we can discuss API surface, engine
> > implications, and a roadmap for merging.
> >
> > Eager to hear your feedback!
> >
> > Best,
> > Prashant Singh
>

Re: Context-Aware Functions for Apache Polaris

Reply via email to