[
https://issues.apache.org/jira/browse/HIVE-21419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16831024#comment-16831024
]
Mert Hocanin edited comment on HIVE-21419 at 5/1/19 2:33 PM:
-------------------------------------------------------------
This issue is fixed by issue HIVE-17639. When the Planner context is not reset
when masking occurs, the planners contexts type goes to UNEXPECTED, which
disallows the Calphite optimizer to optimize the query plan, which includes
partition pruning. Resolving this issue. Thanks for everyones help.
was (Author: mert_hoc):
This issue is fixed by the attached issue.
> Partition Pruning not happening when using Apache Ranger masking
> ----------------------------------------------------------------
>
> Key: HIVE-21419
> URL: https://issues.apache.org/jira/browse/HIVE-21419
> Project: Hive
> Issue Type: Bug
> Components: Physical Optimizer, Query Planning
> Affects Versions: 2.3.2
> Environment: I used an AWS Cloudformation script from AWS's big data
> blog[1]. The EMR AMI uses Hive 2.3.3 and Apache Ranger 1.0.0.
> Source Table:
> CREATE EXTERNAL TABLE analyst1.lineitem_partitioned (
> `l_orderkey` int,
> `l_partkey` int,
> `l_suppkey` int,
> `l_linenumber` int,
> `l_quantity` double,
> `l_extendedprice` double,
> `l_discount` double,
> `l_tax` double,
> `l_returnflag` string,
> `l_linestatus` string,
> `l_commitdate` string,
> `l_receiptdate` string,
> `l_shipinstruct` string,
> `l_shipmode` string,
> `l_comment` string
> ) PARTITIONED BY (`l_shipdate` string)
> STORED AS PARQUET
> LOCATION '/user/analyst1/tpch/sf100/lineitem';
> Destination Table:
> CREATE EXTERNAL TABLE analyst1.test1(
> l_commitdate string,
> l_receiptdate string
> ) PARTITIONED BY (`l_shipdate` string)
> STORED AS PARQUET
> LOCATION '/user/analyst1/tpch/sf100/lineitem_parq_partitioned';
> Query:
> insert overwrite table analyst1.test1 PARTITION (l_shipdate)
> select l_commitdate, l_receiptdate, l_shipdate
> from default.lineitem_parq_partitioned
> where l_shipdate = '1992-01-02';
> Ranger Masking Rule:
> Hive Database: analyst1
> Hive Table: lineitem_partitioned
> Mask Condition Option: Custom: "XXXXXX" (replace the column with a static
> string for simplicity, but our use case uses a complex UDF).
> [1]
> https://aws.amazon.com/blogs/big-data/implementing-authorization-and-auditing-using-apache-ranger-on-amazon-emr/
>
> Reporter: Mert Hocanin
> Priority: Major
> Attachments: Operators-in-debugger-with-masking.png,
> Operators-in-debugger-without-masking.png, hive-jira-schema-explain-plan.txt
>
>
> I have a partitioned table, which I have a Ranger masking policy on a
> non-partition column. When I am attempting to query the table that includes
> the column that has masking enabled, then partition pruning no longer occurs.
> To reproduce:
> Create two partitioned tables. I used TPC-H tables as they are publicly
> available and will provide the schemas and queries I used. Insert into the
> second table from the first table. For example:
> insert overwrite table analyst1.test1 PARTITION (l_shipdate)
> select l_commitdate, l_receiptdate, l_shipdate
> from analyst1.lineitem_partitioned
> where l_shipdate = '1992-01-02';
> I have attached the explain plan when a masking rule on l_commitdate is
> enabled and when not enabled.
> I have done a bit of deep dive and see that the pruning expression is not
> being set when the masking rule is enabled.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)