[GitHub] [incubator-kyuubi] bowenliang123 opened a new issue, #3186: [FEATURE] Support applying Row-level Filter policies For DatasourceV2 on Authz module

GitBox Fri, 05 Aug 2022 04:53:54 -0700


bowenliang123 opened a new issue, #3186:
URL: https://github.com/apache/incubator-kyuubi/issues/3186


   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of 
Conduct](https://www.apache.org/foundation/policies/conduct)
   
   
   ### Search before asking
   
   - [X] I have searched in the 
[issues](https://github.com/apache/incubator-kyuubi/issues?q=is%3Aissue) and 
found no similar issues.
   
   
   ### Describe the feature
   
   Support DataSourceV2Relation to apply applying Row-level Filter policies on 
Icerberg tables.
   
   
   Recently we are trying to applying Ranger's row-level filters on Iceberg 
tables via AuthZ plugin in Spark 3.3.'
   
   We realieze `RuleApplyRowFilterAndDataMasking` in AuthZ currently applys 
row-level filters correctly on both `HiveTableRelation` fro Hive tables and 
`LogicalRelation`, but is not affcecting the execution plan for Iceberg tables.
   
   The main reason is Iceberg table is accssesed via `DataSourceV2Relation` and 
RuleApplyRowFilterAndDataMasking lack of support for it.
   
   ### Motivation
   
   With this feature implemented, 
   - iceberg tables and any other tables via DataSourceV2 will be benefit 
   - row-fitler policies will be correctly applied and also with the masking 
policies
   
   ### Describe the solution
   
   The suggestion and draft implementation is to identify 
`DataSourceV2Relation` in `RuleApplyRowFilterAndDataMasking`, and parsing 
database name and table name in `org.apache.spark.sql.connector.catalog.Table` 
for the next steps of fetching Ranger policies.
   
   `org.apache.iceberg.spark.source`
   
   Considering an iceberg table `gftest.sampleice`, with an row-level filter as 
`id='1'`.
   
   SQL：
   `EXPLAIN EXTENDED select *from gftest.sampleice limit 5;`
   
   Current Execution Plan (1.6.0-SNAPSHOT on master)：
   ```
   == Parsed Logical Plan ==
   'GlobalLimit 5
   +- 'LocalLimit 5
      +- 'Project [*]
         +- 'UnresolvedRelation [gftest, sampleice], [], false
   
   == Analyzed Logical Plan ==
   id: bigint, data: string
   GlobalLimit 5
   +- LocalLimit 5
      +- Project [id#13332L, data#13333]
         +- SubqueryAlias spark_catalog.gftest.sampleice
            +- RelationV2[id#13332L, data#13333] spark_catalog.gftest.sampleice
   
   == Optimized Logical Plan ==
   GlobalLimit 5
   +- LocalLimit 5
      +- RelationV2[id#13332L, data#13333] spark_catalog.gftest.sampleice
   
   == Physical Plan ==
   CollectLimit 5
   +- *(1) ColumnarToRow
      +- BatchScan[id#13332L, data#13333] spark_catalog.gftest.sampleice 
[filters=] RuntimeFilters: []
   
   ```
   
   
   Expected Execution Plan with draft implementation:
   ```
   == Parsed Logical Plan ==
   'GlobalLimit 5
   +- 'LocalLimit 5
      +- 'Project [*]
         +- 'UnresolvedRelation [gftest, sampleice], [], false
   
   == Analyzed Logical Plan ==
   id: bigint, data: string
   GlobalLimit 5
   +- LocalLimit 5
      +- Project [id#142L, data#143]
         +- SubqueryAlias spark_catalog.gftest.sampleice
            +- Project [id#142L, data#143]
               +- Filter (id#142L = cast(1 as bigint))
                  +- RowFilterAndDataMaskingMarker
                     +- RelationV2[id#142L, data#143] 
spark_catalog.gftest.sampleice
   
   == Optimized Logical Plan ==
   GlobalLimit 5
   +- LocalLimit 5
      +- Filter (isnotnull(id#142L) AND (id#142L = 1))
         +- RelationV2[id#142L, data#143] spark_catalog.gftest.sampleice
   
   == Physical Plan ==
   CollectLimit 5
   +- *(1) Filter (isnotnull(id#142L) AND (id#142L = 1))
      +- *(1) ColumnarToRow
         +- BatchScan[id#142L, data#143] spark_catalog.gftest.sampleice 
[filters=id IS NOT NULL, id = 1] RuntimeFilters: []
   ```
   
   
   ### Additional context
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [X] Yes I am willing to submit a PR!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [incubator-kyuubi] bowenliang123 opened a new issue, #3186: [FEATURE] Support applying Row-level Filter policies For DatasourceV2 on Authz module

Reply via email to