[ 
https://issues.apache.org/jira/browse/HIVE-29121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Seonggon Namgung updated HIVE-29121:
------------------------------------
    Description: 
This JIRA is an addendum patch to HIVE-24685 and aims to restore the compiler 
logic from HIVE-17767.

During the substitution of HiveSubQRemoveRelBuilder with Calcite's RelBuilder 
in HIVE-24685, Hive was changed to always use SemiJoin when handling 
uncorrelated IN/EXISTS subqueries with logic == RelOptUtil.Logic.TRUE. Since 
the SemiJoin is intended for use with correlated IN/EXISTS subqueries in 
conjunction with AGGR removal (cf. HIVE-17767), we should avoid using SemiJoin 
for the uncorrelated case, which neither benefits from AGGR removal nor allows 
the application of rules that cannot handle HiveSemiJoin (e.g., join 
reordering).

For clarity, the following combinations of query plans are attached:

{Before HIVE-17767, After HIVE-17767, After HIVE-24685} X \{Correlated, 
Uncorrelated} X \{Before subquery removal, After subquery removal, After 
decorrelation}.
>From the attached plans, we can observe that HIVE-24685 introduces a SemiJoin 
>without removing HiveAggregate, unlike HIVE-17767.

We discovered this issue while investigating a performance regression in TPC-DS 
Query 23.

  was:
This JIRA is an addendum patch to HIVE-24685 and aims to restore the compiler 
logic from HIVE-17767.

During the substitution of HiveSubQRemoveRelBuilder with Calcite's RelBuilder 
in HIVE-24685, Hive was changed to always use SemiJoin when handling 
uncorrelated IN/EXISTS subqueries with logic == RelOptUtil.Logic.TRUE. Since 
the SemiJoin is intended for use with correlated IN/EXISTS subqueries in 
conjunction with AGGR removal (cf. HIVE-17767), we should avoid using SemiJoin 
for the uncorrelated case, which neither benefits from AGGR removal nor allows 
the application of rules that cannot handle HiveSemiJoin (e.g., join 
reordering).

For clarity, the following combinations of query plans are attached:
{Before HIVE-17767, After HIVE-17767, After HIVE-24685} X \{Correlated, 
Uncorrelated} X \{Before subquery removal, After subquery removal, After 
decorrelation}.
>From the attached plans, we can observe that HIVE-24685 introduces a SemiJoin 
>without removing HiveAggregate, unlike HIVE-17767.

We discovered this issue while investigating a performance regression in TPC-DS 
Query 23.


> Restore HiveSubQueryRemoveRule to use InnerJoin instead of SemiJoin for 
> uncorrelated IN/EXISTS subqueries with RelOptUtil.Logic.TRUE.
> -------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-29121
>                 URL: https://issues.apache.org/jira/browse/HIVE-29121
>             Project: Hive
>          Issue Type: Improvement
>         Environment: [^plan.example.txt]
>            Reporter: Seonggon Namgung
>            Assignee: Seonggon Namgung
>            Priority: Major
>         Attachments: plan.example.txt
>
>
> This JIRA is an addendum patch to HIVE-24685 and aims to restore the compiler 
> logic from HIVE-17767.
> During the substitution of HiveSubQRemoveRelBuilder with Calcite's RelBuilder 
> in HIVE-24685, Hive was changed to always use SemiJoin when handling 
> uncorrelated IN/EXISTS subqueries with logic == RelOptUtil.Logic.TRUE. Since 
> the SemiJoin is intended for use with correlated IN/EXISTS subqueries in 
> conjunction with AGGR removal (cf. HIVE-17767), we should avoid using 
> SemiJoin for the uncorrelated case, which neither benefits from AGGR removal 
> nor allows the application of rules that cannot handle HiveSemiJoin (e.g., 
> join reordering).
> For clarity, the following combinations of query plans are attached:
> {Before HIVE-17767, After HIVE-17767, After HIVE-24685} X \{Correlated, 
> Uncorrelated} X \{Before subquery removal, After subquery removal, After 
> decorrelation}.
> From the attached plans, we can observe that HIVE-24685 introduces a SemiJoin 
> without removing HiveAggregate, unlike HIVE-17767.
> We discovered this issue while investigating a performance regression in 
> TPC-DS Query 23.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to