bowenliang123 opened a new issue, #3186: URL: https://github.com/apache/incubator-kyuubi/issues/3186
### Code of Conduct - [X] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct) ### Search before asking - [X] I have searched in the [issues](https://github.com/apache/incubator-kyuubi/issues?q=is%3Aissue) and found no similar issues. ### Describe the feature Support DataSourceV2Relation to apply applying Row-level Filter policies on Icerberg tables. Recently we are trying to applying Ranger's row-level filters on Iceberg tables via AuthZ plugin in Spark 3.3.' We realieze `RuleApplyRowFilterAndDataMasking` in AuthZ currently applys row-level filters correctly on both `HiveTableRelation` fro Hive tables and `LogicalRelation`, but is not affcecting the execution plan for Iceberg tables. The main reason is Iceberg table is accssesed via `DataSourceV2Relation` and RuleApplyRowFilterAndDataMasking lack of support for it. ### Motivation With this feature implemented, - iceberg tables and any other tables via DataSourceV2 will be benefit - row-fitler policies will be correctly applied and also with the masking policies ### Describe the solution The suggestion and draft implementation is to identify `DataSourceV2Relation` in `RuleApplyRowFilterAndDataMasking`, and parsing database name and table name in `org.apache.spark.sql.connector.catalog.Table` for the next steps of fetching Ranger policies. `org.apache.iceberg.spark.source` Considering an iceberg table `gftest.sampleice`, with an row-level filter as `id='1'`. SQL: `EXPLAIN EXTENDED select *from gftest.sampleice limit 5;` Current Execution Plan (1.6.0-SNAPSHOT on master): ``` == Parsed Logical Plan == 'GlobalLimit 5 +- 'LocalLimit 5 +- 'Project [*] +- 'UnresolvedRelation [gftest, sampleice], [], false == Analyzed Logical Plan == id: bigint, data: string GlobalLimit 5 +- LocalLimit 5 +- Project [id#13332L, data#13333] +- SubqueryAlias spark_catalog.gftest.sampleice +- RelationV2[id#13332L, data#13333] spark_catalog.gftest.sampleice == Optimized Logical Plan == GlobalLimit 5 +- LocalLimit 5 +- RelationV2[id#13332L, data#13333] spark_catalog.gftest.sampleice == Physical Plan == CollectLimit 5 +- *(1) ColumnarToRow +- BatchScan[id#13332L, data#13333] spark_catalog.gftest.sampleice [filters=] RuntimeFilters: [] ``` Expected Execution Plan with draft implementation: ``` == Parsed Logical Plan == 'GlobalLimit 5 +- 'LocalLimit 5 +- 'Project [*] +- 'UnresolvedRelation [gftest, sampleice], [], false == Analyzed Logical Plan == id: bigint, data: string GlobalLimit 5 +- LocalLimit 5 +- Project [id#142L, data#143] +- SubqueryAlias spark_catalog.gftest.sampleice +- Project [id#142L, data#143] +- Filter (id#142L = cast(1 as bigint)) +- RowFilterAndDataMaskingMarker +- RelationV2[id#142L, data#143] spark_catalog.gftest.sampleice == Optimized Logical Plan == GlobalLimit 5 +- LocalLimit 5 +- Filter (isnotnull(id#142L) AND (id#142L = 1)) +- RelationV2[id#142L, data#143] spark_catalog.gftest.sampleice == Physical Plan == CollectLimit 5 +- *(1) Filter (isnotnull(id#142L) AND (id#142L = 1)) +- *(1) ColumnarToRow +- BatchScan[id#142L, data#143] spark_catalog.gftest.sampleice [filters=id IS NOT NULL, id = 1] RuntimeFilters: [] ``` ### Additional context _No response_ ### Are you willing to submit PR? - [X] Yes I am willing to submit a PR! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
