[
https://issues.apache.org/jira/browse/HIVE-24957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17311521#comment-17311521
]
Stamatis Zampetakis edited comment on HIVE-24957 at 3/30/21, 1:21 PM:
----------------------------------------------------------------------
The problem lies in the query plan and more specifically in the
{{HiveRelDecorrelator}}.
{noformat}
2021-03-30T06:16:57,682 DEBUG [d8fca83a-1e2a-4864-8730-f496318a0e47 main]
rules.RelFieldTrimmer: Plan after trimming unused fields
HiveProject(b_title=[$0])
HiveFilter(condition=[EXISTS({
HiveProject(a_authorkey=[$0])
HiveFilter(condition=[=(CASE(IS NOT NULL($cor0.b_authorkey),
$cor0.b_authorkey, 300), $0)])
HiveTableScan(table=[[default, author]], table:alias=[a])
})])
HiveProject(b_title=[$1], b_authorkey=[$2])
HiveTableScan(table=[[default, book]], table:alias=[b])
2021-03-30T06:16:57,682 DEBUG [d8fca83a-1e2a-4864-8730-f496318a0e47 main]
parse.CalcitePlanner: Plan before removing subquery:
HiveProject(b_title=[$1])
HiveFilter(condition=[EXISTS({
HiveProject(a_authorkey=[$0])
HiveFilter(condition=[=(CASE(IS NOT NULL($cor0.b_authorkey),
$cor0.b_authorkey, 300), $0)])
HiveTableScan(table=[[default, author]], table:alias=[a])
})])
HiveTableScan(table=[[default, book]], table:alias=[b])
2021-03-30T06:16:57,690 DEBUG [d8fca83a-1e2a-4864-8730-f496318a0e47 main]
parse.CalcitePlanner: Plan just after removing subquery:
HiveProject(b_title=[$1])
LogicalCorrelate(correlation=[$cor0], joinType=[semi], requiredColumns=[{2}])
HiveTableScan(table=[[default, book]], table:alias=[b])
HiveProject(literalTrue=[true])
HiveProject(a_authorkey=[$0])
HiveFilter(condition=[=(CASE(IS NOT NULL($cor0.b_authorkey),
$cor0.b_authorkey, 300), $0)])
HiveTableScan(table=[[default, author]], table:alias=[a])
2021-03-30T06:16:57,796 DEBUG [d8fca83a-1e2a-4864-8730-f496318a0e47 main]
parse.CalcitePlanner: Plan after decorrelation:
HiveProject(b_title=[$1])
HiveSemiJoin(condition=[=($8, $2)], joinType=[semi])
HiveTableScan(table=[[default, book]], table:alias=[b])
HiveProject(literalTrue=[true], b_authorkey=[$1])
HiveProject(a_authorkey=[$0], b_authorkey=[$6])
HiveJoin(condition=[=(CASE(IS NOT NULL($6), $6, 300), $0)],
joinType=[inner], algorithm=[none], cost=[not available])
HiveTableScan(table=[[default, author]], table:alias=[a])
HiveAggregate(group=[{0}])
HiveProject(b_authorkey=[$2])
HiveTableScan(table=[[default, book]], table:alias=[b])
{noformat}
The problem starts with the introduction of the {{HiveSemiJoin}}. Due to that
books with NULL {{b_authorkey}} are removed from the result set.
was (Author: zabetak):
The problem lies in the query plan and more specifically in the
{{HiveRelDecorrelator}}.
{noformat}
2021-03-30T06:07:50,279 DEBUG [348e355c-ca0e-4fc6-b386-1852a35a7f29 main]
rules.RelFieldTrimmer: Plan after trimming unused fields
HiveProject(b_title=[$0])
HiveFilter(condition=[EXISTS({
HiveProject(_o__c0=[1])
HiveFilter(condition=[=(CASE(IS NOT NULL($cor0.b_authorkey),
$cor0.b_authorkey, 300), $0)])
HiveTableScan(table=[[default, author]], table:alias=[a])
})])
HiveProject(b_title=[$1], b_authorkey=[$2])
HiveTableScan(table=[[default, book]], table:alias=[b])
2021-03-30T06:07:50,279 DEBUG [348e355c-ca0e-4fc6-b386-1852a35a7f29 main]
parse.CalcitePlanner: Plan before removing subquery:
HiveProject(b_title=[$1])
HiveFilter(condition=[EXISTS({
HiveProject(_o__c0=[1])
HiveFilter(condition=[=(CASE(IS NOT NULL($cor0.b_authorkey),
$cor0.b_authorkey, 300), $0)])
HiveTableScan(table=[[default, author]], table:alias=[a])
})])
HiveTableScan(table=[[default, book]], table:alias=[b])
2021-03-30T06:07:50,280 DEBUG [348e355c-ca0e-4fc6-b386-1852a35a7f29 main]
parse.CalcitePlanner: Plan just after removing subquery:
HiveProject(b_title=[$1])
LogicalCorrelate(correlation=[$cor0], joinType=[semi], requiredColumns=[{2}])
HiveTableScan(table=[[default, book]], table:alias=[b])
HiveProject(literalTrue=[true])
HiveProject(_o__c0=[1])
HiveFilter(condition=[=(CASE(IS NOT NULL($cor0.b_authorkey),
$cor0.b_authorkey, 300), $0)])
HiveTableScan(table=[[default, author]], table:alias=[a])
2021-03-30T06:07:50,282 DEBUG [348e355c-ca0e-4fc6-b386-1852a35a7f29 main]
parse.CalcitePlanner: Plan after decorrelation:
HiveProject(b_title=[$1])
HiveSemiJoin(condition=[=($8, $2)], joinType=[semi])
HiveTableScan(table=[[default, book]], table:alias=[b])
HiveProject(literalTrue=[true], b_authorkey=[$1])
HiveProject(_o__c0=[1], b_authorkey=[$6])
HiveJoin(condition=[=(CASE(IS NOT NULL($6), $6, 300), $0)],
joinType=[inner], algorithm=[none], cost=[not available])
HiveTableScan(table=[[default, author]], table:alias=[a])
HiveAggregate(group=[{0}])
HiveProject(b_authorkey=[$2])
HiveTableScan(table=[[default, book]], table:alias=[b])
{noformat}
The problem starts with the introduction of the {{HiveSemiJoin}}. Due to that
books with NULL {{b_authorkey}} are removed from the result set.
> Wrong results when subquery has COALESCE in correlation predicate
> -----------------------------------------------------------------
>
> Key: HIVE-24957
> URL: https://issues.apache.org/jira/browse/HIVE-24957
> Project: Hive
> Issue Type: Bug
> Affects Versions: 4.0.0
> Reporter: Stamatis Zampetakis
> Assignee: Stamatis Zampetakis
> Priority: Major
>
> Consider the following example:
> {code:sql}
> create table author (
> a_authorkey int,
> a_name varchar(50));
> create table book (
> b_bookkey int,
> b_title varchar(50),
> b_authorkey int);
> insert into author values (10, 'Victor Hugo');
> insert into author values (20, 'Alexandre Dumas');
> insert into author values (300, 'UNKNOWN');
> insert into book values (1, 'Les Miserables', 10);
> insert into book values (2, 'The Count of Monte Cristo', 20);
> insert into book values (3, 'Men Without Women', 30);
> insert into book values (4, 'Odyssey', null);
> select b.b_title
> from book b
> where exists
> (select a_authorkey
> from author a
> where coalesce(b.b_authorkey, 300) = a.a_authorkey);
> {code}
> *Expected results*
> ||B_TITLE||
> |Les Miserables|
> |The Count of Monte Cristo|
> |Odyssey|
> *Actual results*
> ||B_TITLE||
> |Les Miserables|
> |The Count of Monte Cristo|
> {{Odyssey}} is missing from the result set and it shouldn't since with the
> application of COALESCE operator it should match with the UNKNOWN author.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)