singhpk234 commented on PR #2048: URL: https://github.com/apache/polaris/pull/2048#issuecomment-3084708025
@snazy @adutra thank you for sharing your feedbacks, i want to walk you all through what my thought process was : Why Iceberg Expressions and not SQL Iceberg expresions are portable, dialect agnostic and first class citizens of iceberg world and IMHO are the must for interop Note almost all engines have **_Engine specific SQL -> Engine Expression -> Iceberg Expression -> Iceberg SDK (manifest filtering) pipeline already_** we did explore dialect agnostic SQL such as SQL92 for most part of it and we keep coming back to dialect specific requirements if it was a sql, engines want to store their dialect specific stuff directly in policy and hence in persistence, making this policy only workable to the policy definer dialect. what will be the behaviour if i don;t understand the dialect. What i see is we can expand Iceberg expression to contain UDF references and via UDF you model all you dialect specific stuff but bottom line being you just operate in iceberg expression in policy definition. If we want dialect specific stuff we can still shove everything in UDF, Here is one common example : sha256 in spark dialect is sha2, why can model this as hash udf and then for spark its sha2 and trino 256 ? Also heads up Iceberg Expression are gonna get expanded soon due to the **_constraints_** for v4 work that Anton is driving (a uber level idea was discussed in some of the sync), essentially storing table constraints in iceberg metadata so that it can be retrieved and enforced by calling engine but the storage is still iceberg expressions, so iceberg expression is what they plan to use for interoperability Note : I checked in catalog community sync last time iceberg expression with UDFs seems like a right direction I understand UDF's are not there yet and it will take some time meanwhile using iceberg expression seems and storing RLS there seems like good step IMHO. I know atleast one cloud provider it will for sure help = https://docs.aws.amazon.com/lake-formation/latest/dg/partiql-support.html Why not Substrait ? I think if we want subtrait, we can make UDF's return substrait directly, but i think in iceberg community atleast there is no **concencus** on the IR, as similar discussions have been brought up in the community in past for Iceberg Views, i would request to drive that in iceberg first and then we can incorporate that in policy. Also please note engines like Snowflake / Redshift ... doesn't support substrait, unless its established as a standard IR in iceberg IMHO we should not take dependency on it. > Spark datasets/frames are not SQL, but FGAC with those APIs is a different topic Yes, Non SQL requires a more through discussions imho, as for example how do we model the SQL written in one langue to be parsed ? I hope I am able to explain my thought process here ! I really appreciate you taking a look. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@polaris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org