[
https://issues.apache.org/jira/browse/RANGER-2127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Alberto Romero updated RANGER-2127:
-----------------------------------
Summary: Hive masking policy to allow querying actual data if masked cols
are on predicate (was: Hive Masking policy to allow querying actual data if
masked cols are on predicate)
> Hive masking policy to allow querying actual data if masked cols are on
> predicate
> ---------------------------------------------------------------------------------
>
> Key: RANGER-2127
> URL: https://issues.apache.org/jira/browse/RANGER-2127
> Project: Ranger
> Issue Type: Improvement
> Components: plugins, Ranger
> Affects Versions: 0.7.1
> Reporter: Alberto Romero
> Priority: Major
>
> Enable querying datasets on actual values even if there are masking policies
> for some of the columns, as long as such columns are in the predicate of the
> statement (ie the actual masked values are not returned, only used as part of
> the query).
> For example,
> Table #1: customers
> Cols: ID, COMPANY_NAME, SECTOR, *masked*[CUSTOMER_ID]
> Table #2: orders
> Cols: ID, WAREHOUSE_ID, QUANTITY, VALUE
>
> SELECT c.SECTOR, sum(o.QUANTITY)
> FROM customers c JOIN orders o
> ON (c.CUSTOMER_ID = o.ID)
> group by c.SECTOR;
>
> The query would not return the values from customers.CUSTOMER_ID, but will
> still run the query on the actual values for the column.
>
> A suggested approach would be to create a simple query analyzer to determine
> if any masked data would be returned or not based on the position in the
> query, and allow the query to run on actual values if not.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)