[
https://issues.apache.org/jira/browse/HIVE-16022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15879999#comment-15879999
]
Jason Dere commented on HIVE-16022:
-----------------------------------
Noticed a couple of problems when I run the semijoin optimization on a MERGE
statement:
- DynamicPartitionPruningOptimization.generateSemiJoinOperator(): parentOfRS
does not necessarily have to be a SelectOperator - in this case it is a TS. As
a result we are missing some important checking on whether this table is
appropriate for semijoin opt.
- grandParent.getChildren().add(bloomFilterNode) - This wrongly assumes
grandParent is AND: In this case, there was no previous filterExpr so
grandParent is BETWEEN. Adding the child here incorrectly adds a new parameter
to BETWEEN , which is probably getting ignored. This is why in_bloom_filter()
is not in the EXPLAIN.
> BloomFilter check not showing up in MERGE statement queries
> -----------------------------------------------------------
>
> Key: HIVE-16022
> URL: https://issues.apache.org/jira/browse/HIVE-16022
> Project: Hive
> Issue Type: Bug
> Components: Query Planning
> Reporter: Jason Dere
> Assignee: Jason Dere
> Attachments: HIVE-16022.1.patch
>
>
> Running explain on a MERGE statement with runtime filtering enabled, I see
> the min/max being applied on the large table, but not the bloom filter check:
> {noformat}
> explain merge into acidTbl as t using nonAcidOrcTbl s ON t.a = s.a
> WHEN MATCHED AND s.a > 8 THEN DELETE
> WHEN MATCHED THEN UPDATE SET b = 7
> WHEN NOT MATCHED THEN INSERT VALUES(s.a, s.b)
> ...
> Map 1
> Map Operator Tree:
> TableScan
> alias: t
> Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL
> Column stats: NONE
> Filter Operator
> predicate: a BETWEEN DynamicValue(RS_3_s_a_min) AND
> DynamicValue(RS_3_s_a_max) (type: boolean)
> Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL
> Column stats: NONE
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)