Seonggon Namgung created HIVE-29121:
---------------------------------------
Summary: Restore HiveSubQueryRemoveRule to use InnerJoin instead
of SemiJoin for uncorrelated IN/EXISTS subqueries with RelOptUtil.Logic.TRUE.
Key: HIVE-29121
URL: https://issues.apache.org/jira/browse/HIVE-29121
Project: Hive
Issue Type: Improvement
Environment: [^plan.example.txt]
Reporter: Seonggon Namgung
Assignee: Seonggon Namgung
Attachments: plan.example.txt
This JIRA is an addendum patch to HIVE-24685 and aims to restore the compiler
logic from HIVE-17767.
During the substitution of HiveSubQRemoveRelBuilder with Calcite's RelBuilder
in HIVE-24685, Hive was changed to always use SemiJoin when handling
uncorrelated IN/EXISTS subqueries with logic == RelOptUtil.Logic.TRUE. Since
the SemiJoin is intended for use with correlated IN/EXISTS subqueries in
conjunction with AGGR removal (cf. HIVE-17767), we should avoid using SemiJoin
for the uncorrelated case, which neither benefits from AGGR removal nor allows
the application of rules that cannot handle HiveSemiJoin (e.g., join
reordering).
For clarity, the following combinations of query plans are attached:
{Before HIVE-17767, After HIVE-17767, After HIVE-24685} X \{Correlated,
Uncorrelated} X \{Before subquery removal, After subquery removal, After
decorrelation}.
>From the attached plans, we can observe that HIVE-24685 introduces a SemiJoin
>without removing HiveAggregate, unlike HIVE-17767.
We discovered this issue while investigating a performance regression in TPC-DS
Query 23.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)