[
https://issues.apache.org/jira/browse/HIVE-21419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16789910#comment-16789910
]
Mert Hocanin commented on HIVE-21419:
-------------------------------------
I tried with two masking functions within Ranger, the first is regex_replace(),
and the second was just hardcoded string "XXXXXX". I was able to get the
behaviour I was looking for by creating a View with the table with masking, and
then using the view in the query. Ie:
CREATE VIEW analyst1.test1_view as
select l_commitdate, l_receiptdate, l_shipdate
from analyst1.lineitem_partitioned
where l_shipdate = '1992-01-02'
group by l_commitdate, l_receiptdate, l_shipdate;
insert overwrite table analyst1.test1 PARTITION (l_shipdate)
Select * from analyst1.test1_view;
Using "With" did not help.
> Partition Pruning not happening when using Apache Ranger masking
> ----------------------------------------------------------------
>
> Key: HIVE-21419
> URL: https://issues.apache.org/jira/browse/HIVE-21419
> Project: Hive
> Issue Type: Bug
> Components: Physical Optimizer, Query Planning
> Affects Versions: 2.3.2
> Environment: I used an AWS Cloudformation script from AWS's big data
> blog[1]. The EMR AMI uses Hive 2.3.3 and Apache Ranger 1.0.0.
> Source Table:
> CREATE EXTERNAL TABLE analyst1.lineitem_partitioned (
> `l_orderkey` int,
> `l_partkey` int,
> `l_suppkey` int,
> `l_linenumber` int,
> `l_quantity` double,
> `l_extendedprice` double,
> `l_discount` double,
> `l_tax` double,
> `l_returnflag` string,
> `l_linestatus` string,
> `l_commitdate` string,
> `l_receiptdate` string,
> `l_shipinstruct` string,
> `l_shipmode` string,
> `l_comment` string
> ) PARTITIONED BY (`l_shipdate` string)
> STORED AS PARQUET
> LOCATION '/user/analyst1/tpch/sf100/lineitem';
> Destination Table:
> CREATE EXTERNAL TABLE analyst1.test1(
> l_commitdate string,
> l_receiptdate string
> ) PARTITIONED BY (`l_shipdate` string)
> STORED AS PARQUET
> LOCATION '/user/analyst1/tpch/sf100/lineitem_parq_partitioned';
> Query:
> insert overwrite table analyst1.test1 PARTITION (l_shipdate)
> select l_commitdate, l_receiptdate, l_shipdate
> from default.lineitem_parq_partitioned
> where l_shipdate = '1992-01-02';
> Ranger Masking Rule:
> Hive Database: analyst1
> Hive Table: lineitem_partitioned
> Mask Condition Option: Custom: "XXXXXX" (replace the column with a static
> string for simplicity, but our use case uses a complex UDF).
> [1]
> https://aws.amazon.com/blogs/big-data/implementing-authorization-and-auditing-using-apache-ranger-on-amazon-emr/
>
> Reporter: Mert Hocanin
> Priority: Major
> Attachments: Operators-in-debugger-with-masking.png,
> Operators-in-debugger-without-masking.png, hive-jira-schema-explain-plan.txt
>
>
> I have a partitioned table, which I have a Ranger masking policy on a
> non-partition column. When I am attempting to query the table that includes
> the column that has masking enabled, then partition pruning no longer occurs.
> To reproduce:
> Create two partitioned tables. I used TPC-H tables as they are publicly
> available and will provide the schemas and queries I used. Insert into the
> second table from the first table. For example:
> insert overwrite table analyst1.test1 PARTITION (l_shipdate)
> select l_commitdate, l_receiptdate, l_shipdate
> from analyst1.lineitem_partitioned
> where l_shipdate = '1992-01-02';
> I have attached the explain plan when a masking rule on l_commitdate is
> enabled and when not enabled.
> I have done a bit of deep dive and see that the pruning expression is not
> being set when the masking rule is enabled.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)