[ 
https://issues.apache.org/jira/browse/HIVE-27714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Janos Kovacs updated HIVE-27714:
--------------------------------
    Description: 
With current Iceberg location authorization one explicit ranger policy is 
required for every tables to prevent the cross-reference of metadata_location 
exploit as any wildcard based policy to cover set of tables would open up 
cross-referencing locations between tables covert by the wildcard.

This is nearly impossible in a production environment. 

The proposal is to handle the Iceberg table RWStorage authorization a different 
way when the table is created/altered with it's default location as in this 
case there is no attempt for cross-referencing another table. There are two 
options for this:

When?
 * If no custom metadata_location is set/given in the CREATE/ALTER calls
 * If the given metadata_locaiton's path (e.g. without the metadata json file 
name) is the same as the current metadata_location's path in the ALTER calls
 * If the given metadata_location's path set/given in CREATE/ALTEER calls is 
the same as the default location would be for the table based on the warehouse 
and/or database locations

What
 # Either do not call the RWStorage Authorizer for this case
 # Or set the location to a constant value that can be easily handled with one 
single access policy on the Authorizer side

Pros/Cons:
 * Option-1 would not call authorizer so it would not generate an audit even 
for these on RWStorage level policies but it would omit the Authorization step 
so it would be more performant
 * Option-2 would end up in the Authorizer which means also would generate an 
audit event. It also needs a pre-agreed constant for such cases that can be 
differentiated from normal custom location based authorizations.

If the Option-2 is chosen:
 * The following policy syntax could be used for custom locations: 
{noformat}
iceberg://mydatabase/mytable/snapshot=/my/custom/location/whatever/* {noformat}

 * While the pre-agreed default location constant based policy format could be:
{noformat}
iceberg://*/*/snapshot=default_location {noformat}

 

There could be even a new property introduced to decide if the Authorization 
for default locations should be skipped at-all, or not (and use the e.g. 
snapshot=default_location constant). This way everyone can decide whether audit 
events or the performance w/o the authorization step are preferred. 

  was:
With current Iceberg location authorization one explicit ranger policy is 
required for every tables to prevent the cross-reference of metadata_location 
exploit as any wildcard based policy to cover set of tables would open up 
cross-referencing locations between tables covert by the wildcard.

This is nearly impossible in a production environment. 

The proposal is to handle the Iceberg table RWStorage authorization a different 
way when the table is created/altered with it's default location as in this 
case there is no attempt for cross-referencing another table. There are two 
options for this:

When?
 * If no custom metadata_location is set/given in the CREATE/ALTER calls
 * If the given metadata_locaiton's path (e.g. without the metadata json file 
name) is the same as the current metadata_location's path in the ALTER calls
 * If the given metadata_location's path set/given in CREATE/ALTEER calls is 
the same as the default location would be for the table based on the warehouse 
and/or database locations

What
 # Either do not call the RWStorage Authorizer for this case
 # Or set the location to a constant value that can be easily handled with one 
single access policy on the Authorizer side

Pros/Cons:
 * Option-1 would not call authorizer so it would not generate an audit even 
for these on RWStorage level policies but it would omit the Authorization step 
so it would be more performant
 * Option-2 would end up in the Authorizer which means also would generate an 
audit event. It also needs a pre-agreed constant for such cases that can be 
differentiated from normal custom location based authorizations.

If the Option-2 is chosen:
 * The following policy syntax could be used for custom locations: 
{noformat}
iceberg://mydatabase/mytable/snapshot=/my/custom/location/whatever/* {noformat}

 * While the pre-agreed default location constant based policy format could be:
{noformat}
iceberg://*/*/snapshot=default_location {noformat}

 


> Iceberg: metadata location overrides can cause data breach - handling default 
> locations 
> ----------------------------------------------------------------------------------------
>
>                 Key: HIVE-27714
>                 URL: https://issues.apache.org/jira/browse/HIVE-27714
>             Project: Hive
>          Issue Type: Sub-task
>          Components: Authorization, Iceberg integration
>    Affects Versions: 4.0.0-alpha-2
>            Reporter: Janos Kovacs
>            Assignee: Ayush Saxena
>            Priority: Critical
>              Labels: check
>
> With current Iceberg location authorization one explicit ranger policy is 
> required for every tables to prevent the cross-reference of metadata_location 
> exploit as any wildcard based policy to cover set of tables would open up 
> cross-referencing locations between tables covert by the wildcard.
> This is nearly impossible in a production environment. 
> The proposal is to handle the Iceberg table RWStorage authorization a 
> different way when the table is created/altered with it's default location as 
> in this case there is no attempt for cross-referencing another table. There 
> are two options for this:
> When?
>  * If no custom metadata_location is set/given in the CREATE/ALTER calls
>  * If the given metadata_locaiton's path (e.g. without the metadata json file 
> name) is the same as the current metadata_location's path in the ALTER calls
>  * If the given metadata_location's path set/given in CREATE/ALTEER calls is 
> the same as the default location would be for the table based on the 
> warehouse and/or database locations
> What
>  # Either do not call the RWStorage Authorizer for this case
>  # Or set the location to a constant value that can be easily handled with 
> one single access policy on the Authorizer side
> Pros/Cons:
>  * Option-1 would not call authorizer so it would not generate an audit even 
> for these on RWStorage level policies but it would omit the Authorization 
> step so it would be more performant
>  * Option-2 would end up in the Authorizer which means also would generate an 
> audit event. It also needs a pre-agreed constant for such cases that can be 
> differentiated from normal custom location based authorizations.
> If the Option-2 is chosen:
>  * The following policy syntax could be used for custom locations: 
> {noformat}
> iceberg://mydatabase/mytable/snapshot=/my/custom/location/whatever/* 
> {noformat}
>  * While the pre-agreed default location constant based policy format could 
> be:
> {noformat}
> iceberg://*/*/snapshot=default_location {noformat}
>  
> There could be even a new property introduced to decide if the Authorization 
> for default locations should be skipped at-all, or not (and use the e.g. 
> snapshot=default_location constant). This way everyone can decide whether 
> audit events or the performance w/o the authorization step are preferred. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to