[
https://issues.apache.org/jira/browse/IMPALA-11744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17643650#comment-17643650
]
ASF subversion and git services commented on IMPALA-11744:
----------------------------------------------------------
Commit 367378438f8a780cb44cf904e3d449165fdc190e in impala's branch
refs/heads/master from stiga-huang
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=367378438 ]
IMPALA-11744: Table mask view should preserve the original column order in Hive
Ranger provides column masking and row filtering policies to mask
sensitive data for specific users/groups. When a table should be masked
in a query, Impala replaces it with a table mask view that exposes the
columns with masked expressions.
After IMPALA-9661, only selected columns are exposed in the table mask
view. However, the columns of the view are exposed in the order that
they are registered. If the registering order differs from the column
order in the table, STAR expansions will mismatch the columns.
To be specific, let's say table 'tbl' with 3 columns a, b, c should be
masked in the following query:
select b, * from tbl;
Ideally Impala should replace the TableRef of 'tbl' with a table mask
view as:
select b, * from (
select mask(a) a, mask(b) b, mask(c) c from tbl
) t;
Currently, the rewritten query is
select b, * from (
select mask(b) b, mask(a) a, mask(c) c from tbl
) t;
This incorrectly expands the STAR as "b, a, c" in the re-analyze phase.
The cause is that column 'b' is registered earlier than all other
columns. This patch fixes it by sorting the selected columns based on
their original order in the table.
Tests:
- Add tests for selecting STAR with normal columns on table and view.
Change-Id: Ic83d78312b19fa2c5ab88ac4f359bfabaeaabce6
Reviewed-on: http://gerrit.cloudera.org:8080/19279
Reviewed-by: Impala Public Jenkins <[email protected]>
Tested-by: Impala Public Jenkins <[email protected]>
> Table mask view should reserve the original column order in Hive
> ----------------------------------------------------------------
>
> Key: IMPALA-11744
> URL: https://issues.apache.org/jira/browse/IMPALA-11744
> Project: IMPALA
> Issue Type: Bug
> Components: Security
> Affects Versions: Impala 4.0.0, Impala 4.1.0, Impala 4.1.1
> Reporter: Quanlong Huang
> Assignee: Quanlong Huang
> Priority: Blocker
>
> Ranger provides column masking and row filtering policies to mask sensitive
> data to specified users/groups. When a table should be masked in a query,
> Impala replaces it with a table mask view that expose the columns with masked
> expressions.
> After IMPALA-9661, only selected columns are exposed in the table mask view.
> However, the columns are exposed in the order that they are registered, which
> can provide wrong results if the original statement contains STAR expressions.
> The following example shows the issue:
> {code:sql}
> create table mask_test_tbl (a string, b string, c string, d string);
> insert into mask_test_tbl values ("aaaa", "bbbb", "cccc", "dddd");
> -- Create a column masking policies on column c using Redact
> select * from mask_test_tbl;
> +------+------+------+------+
> | a | b | c | d |
> +------+------+------+------+
> | aaaa | bbbb | xxxx | dddd |
> +------+------+------+------+
> {code}
> The following query produces incorrect results:
> {code:sql}
> select b, * from mask_test_tbl;
> +------+------+------+------+------+
> | b | a | b | c | d |
> +------+------+------+------+------+
> | bbbb | bbbb | aaaa | xxxx | dddd |
> +------+------+------+------+------+
> {code}
> Note that the results of 2nd and 3rd columns are reverted.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]