woosuk.ro created IMPALA-13667:
----------------------------------

             Summary: Unexpected Nested mask_hash Functions When Using Views in 
Impala with Ranger
                 Key: IMPALA-13667
                 URL: https://issues.apache.org/jira/browse/IMPALA-13667
             Project: IMPALA
          Issue Type: Question
          Components: Frontend
            Reporter: woosuk.ro


h3. *Description*

When using Impala with Ranger for data masking, applying a {{mask_hash}} policy 
to columns in both tables and views results in the {{mask_hash}} function being 
nested multiple times. This behavior leads to redundant hashing operations. Is 
this intended behavior?
h3. *Steps to Reproduce*
 # *Apply Masking Policies:*

 ** Apply a {{mask_hash}} policy to a specific column (e.g., 
{{{}account_number{}}}) across all tables in two databases, {{temp_db}} and 
{{{}private_db{}}}.
 # *Create a Base Table:*
CREATE TABLE private_db.base_table (
    account_number STRING,
    other_column STRING
);
 # *Create a View Referencing the Base Table:*
CREATE VIEW private_db.base_view AS
SELECT * FROM private_db.base_table;
 
 # *Create Another View Referencing the First View:*
CREATE VIEW temp_db.secondary_view AS
SELECT * FROM private_db.base_view;
 
 # *Execute a Query on the Second View:*
SELECT * FROM temp_db.secondary_view;

h3. *Expected Behavior*

The {{mask_hash}} function should be applied *once* to the {{account_number}} 
column, regardless of the number of view layers referencing the masked table or 
view.
----
h3. *Actual Behavior*

The {{mask_hash}} function is applied *three times* to the {{account_number}} 
column due to nested view references. This results in multiple layers of 
hashing, as observed in both the query execution plan and Ranger audit logs.

*Example Query Execution Plan:*
WARNING: The following tables are missing relevant table and/or column 
statistics.
private_db.base_table
Analyzed query: SELECT * FROM (SELECT mask_hash(account_number) account_number, 
my_account_number FROM
temp_db.secondary_view)F01:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1
|  Per-Host Resources: mem-estimate=4.02MB mem-reservation=4.00MB 
thread-reservation=1
PLAN-ROOT SINK
|  output exprs: **mask_hash(mask_hash(mask_hash(account_number)))**, 
my_account_number
|  mem-estimate=4.00MB mem-reservation=4.00MB spill-buffer=2.00MB 
thread-reservation=0
|
*Ranger Audit Logs:*
# 1. temp_db.secondary_view account_number column masking
\{
    "access": "mask_hash",
    "resource": "temp_db/secondary_view/account_number",
    "resType": "@column",
    "reqData": "SELECT * FROM temp_db.secondary_view"
}# 2. private_db.base_view account_number column masking
\{
    "access": "mask_hash",
    "resource": "private_db/base_view/account_number",
    "resType": "@column",
    "reqData": "SELECT * FROM temp_db.secondary_view"
}# 3. private_db.base_table account_number column masking
\{
    "access": "mask_hash",
    "resource": "private_db/base_table/account_number",
    "resType": "@column",
    "reqData": "SELECT * FROM temp_db.secondary_view"
}

*Environment*
- Impala: 4.4.0
- Ranger: 2.3.0
 
 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to