[ 
https://issues.apache.org/jira/browse/IMPALA-9661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17705714#comment-17705714
 ] 

ASF subversion and git services commented on IMPALA-9661:
---------------------------------------------------------

Commit 9bf8607ce58a1a2573c8c2b0ebdf9179a1840429 in impala's branch 
refs/heads/branch-4.1.2 from stiga-huang
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=9bf8607ce ]

IMPALA-11744: Table mask view should preserve the original column order in Hive

Ranger provides column masking and row filtering policies to mask
sensitive data for specific users/groups. When a table should be masked
in a query, Impala replaces it with a table mask view that exposes the
columns with masked expressions.

After IMPALA-9661, only selected columns are exposed in the table mask
view. However, the columns of the view are exposed in the order that
they are registered. If the registering order differs from the column
order in the table, STAR expansions will mismatch the columns.

To be specific, let's say table 'tbl' with 3 columns a, b, c should be
masked in the following query:
  select b, * from tbl;
Ideally Impala should replace the TableRef of 'tbl' with a table mask
view as:
  select b, * from (
    select mask(a) a, mask(b) b, mask(c) c from tbl
  ) t;

Currently, the rewritten query is
  select b, * from (
    select mask(b) b, mask(a) a, mask(c) c from tbl
  ) t;
This incorrectly expands the STAR as "b, a, c" in the re-analyze phase.

The cause is that column 'b' is registered earlier than all other
columns. This patch fixes it by sorting the selected columns based on
their original order in the table.

Tests:
 - Add tests for selecting STAR with normal columns on table and view.

Backport Note for 4.1.2:
Keep the import of Optional in Analyzer.java.
Removed some tests due to virtual column input__file__name not supported.

Change-Id: Ic83d78312b19fa2c5ab88ac4f359bfabaeaabce6
Reviewed-on: http://gerrit.cloudera.org:8080/19279
Reviewed-by: Impala Public Jenkins <[email protected]>
Tested-by: Impala Public Jenkins <[email protected]>


> Avoid introducing unused columns in table masking view
> ------------------------------------------------------
>
>                 Key: IMPALA-9661
>                 URL: https://issues.apache.org/jira/browse/IMPALA-9661
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Frontend
>            Reporter: Quanlong Huang
>            Assignee: Quanlong Huang
>            Priority: Critical
>             Fix For: Impala 4.0.0
>
>
> If a table has column masking policies, we replace its unanalyzed TableRef 
> with an analyzed InlineViewRef (table masking view) in FromClause.analyze(). 
> However, we can't detect which columns are actually used in the original 
> query at this point. In fact, analyze() for SelectList, WhereClause, 
> GroupByClause and other clauses containing SlotRefs happen after 
> FromClause.analyze(). After the whole query block is analyzed, we can get the 
> exact set of required columns. We should do table masking there to avoid 
> introducing unused columns.
> To be specifit, if table _tbl_(_id_ int, _name_ string, _address_ string) has 
> column masking policies for column _name_ and _address_ to mask them, the 
> following query
> {code:sql}
> select name from tbl where id > 10;
> {code}
> will be rewritten to
> {code:sql}
> select name from (
>   select id, mask(name) as name, mask(address) as address from tbl
> ) tbl where id > 10;
> {code}
> The rewritten query introduce the requirement for SELECT privilege on the 
> _address_ column which isn't required by the original query. We should either 
> fix this or IMPALA-9223.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to