[
https://issues.apache.org/jira/browse/HIVE-15758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16088071#comment-16088071
]
Vineet Garg commented on HIVE-15758:
------------------------------------
[~pxiong] For the query mentioned above in this JIRA hive generates following
plan
{noformat}
Stage-0
Fetch Operator
limit:-1
Stage-1
Reducer 2
File Output Operator [FS_24]
Select Operator [SEL_23] (rows=185 width=244)
Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8"]
Filter Operator [FIL_22] (rows=185 width=244)
predicate:(UDFToLong(_col5) <> CASE WHEN (_col10 is null) THEN (0)
ELSE (_col9) END)
Merge Join Operator [MERGEJOIN_29] (rows=185 width=244)
Conds:RS_19._col4=RS_20._col2(Left
Outer),Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8","_col9","_col10"]
<-Map 1 [SIMPLE_EDGE]
SHUFFLE [RS_19]
PartitionCols:_col4
Select Operator [SEL_1] (rows=26 width=121)
Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8"]
TableScan [TS_0] (rows=26 width=121)
default@part,part,Tbl:COMPLETE,Col:NONE,Output:["p_partkey","p_name","p_mfgr","p_brand","p_type","p_size","p_container","p_retailprice","p_comment"]
<-Reducer 5 [SIMPLE_EDGE]
SHUFFLE [RS_20]
PartitionCols:_col2
Select Operator [SEL_18] (rows=169 width=243)
Output:["_col0","_col1","_col2"]
Group By Operator [GBY_17] (rows=169 width=243)
Output:["_col0","_col1"],aggregations:["count(VALUE._col0)"],keys:KEY._col0
<-Reducer 4 [SIMPLE_EDGE]
SHUFFLE [RS_16]
PartitionCols:_col0
Group By Operator [GBY_15] (rows=338 width=243)
Output:["_col0","_col1"],aggregations:["count(_col1)"],keys:_col2
Select Operator [SEL_14] (rows=338 width=243)
Output:["_col1","_col2"]
Filter Operator [FIL_13] (rows=338 width=243)
predicate:(_col2 <> _col0)
Merge Join Operator [MERGEJOIN_28] (rows=338
width=243)
Conds:(Inner),Output:["_col0","_col1","_col2"]
<-Map 3 [CUSTOM_SIMPLE_EDGE]
PARTITION_ONLY_SHUFFLE [RS_10]
Select Operator [SEL_3] (rows=26 width=121)
Output:["_col0","_col1"]
TableScan [TS_2] (rows=26 width=121)
default@part,pp,Tbl:COMPLETE,Col:NONE,Output:["p_type","p_size"]
<-Reducer 7 [CUSTOM_SIMPLE_EDGE]
PARTITION_ONLY_SHUFFLE [RS_11]
Group By Operator [GBY_8] (rows=13 width=121)
Output:["_col0"],keys:KEY._col0
<-Map 6 [SIMPLE_EDGE]
SHUFFLE [RS_7]
PartitionCols:_col0
Group By Operator [GBY_6] (rows=26
width=121)
Output:["_col0"],keys:p_type
TableScan [TS_4] (rows=26 width=121)
default@part,part,Tbl:COMPLETE,Col:NONE,Output:["p_type"]
{noformat}
First outer table p is joined with inner table pp to come up with all possible
values where correlated condition {{pp.p_type <> p.p_type}} is true. Then group
by is done on this to do count(). This is then further joined (Left outer) with
outer table with a filter on top taking care of case when right side is null
(count is suppose to return 0 instead of null for empty rows) along with
whatever subquery condition there is.
Briefly we re-write such queries into left outer join with a filter on top.
Hope this makes sense.
> Allow correlated scalar subqueries with aggregates which has non-equi join
> predicates
> -------------------------------------------------------------------------------------
>
> Key: HIVE-15758
> URL: https://issues.apache.org/jira/browse/HIVE-15758
> Project: Hive
> Issue Type: Sub-task
> Components: Logical Optimizer
> Reporter: Vineet Garg
> Assignee: Vineet Garg
> Labels: sub-query
> Attachments: HIVE-15758.1.patch, HIVE-15758.2.patch
>
>
> Queries such as
> {code} select * from part where p_size <> (select count(p_size) from part pp
> where part.p_type <> pp.p_type); {code} are currently not allowed since HIVE
> doesn't know how to rewrite such queries to preserve the correctness for
> cases when there is zero row
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)