[
https://issues.apache.org/jira/browse/SUBMARINE-638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
ASF GitHub Bot updated SUBMARINE-638:
-------------------------------------
Labels: pull-request-available security (was: security)
> Spark-security ranger plugin - Limit to be applied after masking projection
> ---------------------------------------------------------------------------
>
> Key: SUBMARINE-638
> URL: https://issues.apache.org/jira/browse/SUBMARINE-638
> Project: Apache Submarine
> Issue Type: Improvement
> Components: Security
> Reporter: Tenneti Venkata Sri Harsha
> Priority: Major
> Labels: pull-request-available, security
>
> Let's say there is a query with a limit like below and value has to be masked
> {code:java}
> SELECT key, value from default.src limit 10{code}
> Then the plan looks like below
> {code:java}
> == Parsed Logical Plan ==
> 'GlobalLimit 10
> +- 'LocalLimit 10
> +- 'Project ['key, 'value]
> +- 'UnresolvedRelation `default`.`src`Project
> == Optimized Logical Plan ==
> [key#36,HiveGenericUDF#org.apache.hadoop.hive.ql.udf.generic.GenericUDFMaskShowLastN(value#37,4,x,x,x,-1,1)
> AS value#41]
> +- GlobalLimit 10
> +- LocalLimit 10
> +- SubmarineDataMasking
> +- HiveTableRelation `default`.`src`,
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [key#36, value#37]
> == Physical Plan ==
> Project [key#36,
> HiveGenericUDF#org.apache.hadoop.hive.ql.udf.generic.GenericUDFMaskShowLastN(value#37,4,x,x,x,-1,1)
> AS value#41]
> +- *(2) GlobalLimit 10
> +- Exchange SinglePartition
> +- *(1) LocalLimit 10
> +- *(1) HiveTableScan [key#36, value#37], HiveTableRelation
> `default`.`src`, org.apache.hadoop.hive.serde2.OpenCSVSerde, [key#36,
> value#37]
> {code}
> The above plan will read all the files in the table. This is because the
> optimised logical plan has a project over the limit. If the optimised logical
> plan has a limit after masking projection the physical plan will convert to
> have collectLimit and hence the collect will read only one file.
> {code:java}
> == Parsed Logical Plan ==
> 'GlobalLimit 10
> +- 'LocalLimit 10
> +- 'Project ['key, 'value]
> +- 'UnresolvedRelation `default`.`src`
> == Optimized Logical Plan ==
> GlobalLimit 10
> +- LocalLimit 10
> +- Project [key#36,
> HiveGenericUDF#org.apache.hadoop.hive.ql.udf.generic.GenericUDFMaskShowLastN(value#37,4,x,x,x,-1,1)
> AS value#41]
> +- SubmarineDataMasking
> +- HiveTableRelation `default`.`src`,
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [key#36, value#37]
> == Physical Plan ==
> CollectLimit 10
> +- Project [key#36,
> HiveGenericUDF#org.apache.hadoop.hive.ql.udf.generic.GenericUDFMaskShowLastN(value#37,4,x,x,x,-1,1)
> AS value#41]
> +- *(1) HiveTableScan [key#36, value#37], HiveTableRelation
> `default`.`src`, org.apache.hadoop.hive.serde2.OpenCSVSerde, [key#36,
> value#37]{code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]