[ 
https://issues.apache.org/jira/browse/SUBMARINE-638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SUBMARINE-638:
-------------------------------------
    Labels: pull-request-available security  (was: security)

> Spark-security ranger plugin - Limit to be applied after masking projection
> ---------------------------------------------------------------------------
>
>                 Key: SUBMARINE-638
>                 URL: https://issues.apache.org/jira/browse/SUBMARINE-638
>             Project: Apache Submarine
>          Issue Type: Improvement
>          Components: Security
>            Reporter: Tenneti Venkata Sri Harsha
>            Priority: Major
>              Labels: pull-request-available, security
>
> Let's say there is a query with a limit like below and value has to be masked
> {code:java}
> SELECT key, value from default.src limit 10{code}
> Then the plan looks like below
> {code:java}
> == Parsed Logical Plan ==
> 'GlobalLimit 10
> +- 'LocalLimit 10
>    +- 'Project ['key, 'value]
>       +- 'UnresolvedRelation `default`.`src`Project 
> == Optimized Logical Plan ==
> [key#36,HiveGenericUDF#org.apache.hadoop.hive.ql.udf.generic.GenericUDFMaskShowLastN(value#37,4,x,x,x,-1,1)
>  AS value#41]
> +- GlobalLimit 10
>    +- LocalLimit 10
>       +- SubmarineDataMasking
>          +- HiveTableRelation `default`.`src`, 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [key#36, value#37]
> == Physical Plan ==
> Project [key#36, 
> HiveGenericUDF#org.apache.hadoop.hive.ql.udf.generic.GenericUDFMaskShowLastN(value#37,4,x,x,x,-1,1)
>  AS value#41]
> +- *(2) GlobalLimit 10
>    +- Exchange SinglePartition
>       +- *(1) LocalLimit 10
>          +- *(1) HiveTableScan [key#36, value#37], HiveTableRelation 
> `default`.`src`, org.apache.hadoop.hive.serde2.OpenCSVSerde, [key#36, 
> value#37]
> {code}
> The above plan will read all the files in the table. This is because the 
> optimised logical plan has a project over the limit. If the optimised logical 
> plan has a limit after masking projection the physical plan will convert to 
> have collectLimit and hence the collect will read only one file.
> {code:java}
> == Parsed Logical Plan ==
> 'GlobalLimit 10
> +- 'LocalLimit 10
>    +- 'Project ['key, 'value]
>       +- 'UnresolvedRelation `default`.`src`
> == Optimized Logical Plan ==
> GlobalLimit 10
> +- LocalLimit 10
>    +- Project [key#36, 
> HiveGenericUDF#org.apache.hadoop.hive.ql.udf.generic.GenericUDFMaskShowLastN(value#37,4,x,x,x,-1,1)
>  AS value#41]
>       +- SubmarineDataMasking
>          +- HiveTableRelation `default`.`src`, 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [key#36, value#37]
> == Physical Plan ==
> CollectLimit 10
>    +- Project [key#36, 
> HiveGenericUDF#org.apache.hadoop.hive.ql.udf.generic.GenericUDFMaskShowLastN(value#37,4,x,x,x,-1,1)
>  AS value#41]
>       +- *(1) HiveTableScan [key#36, value#37], HiveTableRelation 
> `default`.`src`, org.apache.hadoop.hive.serde2.OpenCSVSerde, [key#36, 
> value#37]{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to