kasakrisz commented on code in PR #4864:
URL: https://github.com/apache/hive/pull/4864#discussion_r1400574210
##########
ql/src/test/queries/clientpositive/auto_sortmerge_join_17.q:
##########
@@ -0,0 +1,20 @@
+CREATE TABLE tbl1_n5(key int, value string) CLUSTERED BY (key) SORTED BY (key)
INTO 2 BUCKETS;
+
+insert into tbl1_n5(key, value)
+values
+(0, 'val_0'),
+(2, 'val_2'),
+(9, 'val_9');
+
+explain
+SELECT t1.key from
+(SELECT key , row_number() over(partition by key order by value desc) as rk
from tbl1_n5) t1
+join
+( SELECT key,count(distinct value) as cp_count from tbl1_n5 group by key) t2
+on t1.key = t2.key where rk = 1;
+
+SELECT t1.key from
+(SELECT key , row_number() over(partition by key order by value desc) as rk
from tbl1_n5) t1
Review Comment:
In that case SMB conversion can not be applyed because parent RS operators
doesn't have sort keys:
```
POSTHOOK: query: explain
SELECT t1.rk from
(SELECT key rk FROM tbl1_n5 GROUP BY key) t1
join
( SELECT key,count(distinct value) as cp_count from tbl1_n5 group by key) t2
on t1.rk = t2.key where rk = '1'
POSTHOOK: type: QUERY
POSTHOOK: Input: default@tbl1_n5
#### A masked pattern was here ####
STAGE DEPENDENCIES:
Stage-1 is a root stage
Stage-0 depends on stages: Stage-1
STAGE PLANS:
Stage: Stage-1
Tez
#### A masked pattern was here ####
Edges:
Reducer 2 <- Map 1 (SIMPLE_EDGE)
Reducer 3 <- Reducer 2 (SIMPLE_EDGE)
Reducer 4 <- Reducer 3 (XPROD_EDGE), Reducer 5 (XPROD_EDGE)
Reducer 5 <- Map 1 (SIMPLE_EDGE)
#### A masked pattern was here ####
Vertices:
Map 1
Map Operator Tree:
TableScan
alias: tbl1_n5
filterExpr: (key = 1) (type: boolean)
Statistics: Num rows: 3 Data size: 279 Basic stats:
COMPLETE Column stats: COMPLETE
Filter Operator
predicate: (key = 1) (type: boolean)
Statistics: Num rows: 1 Data size: 93 Basic stats:
COMPLETE Column stats: COMPLETE
Select Operator
expressions: value (type: string)
outputColumnNames: value
Statistics: Num rows: 1 Data size: 93 Basic stats:
COMPLETE Column stats: COMPLETE
Group By Operator
keys: value (type: string)
minReductionHashAggr: 0.4
mode: hash
outputColumnNames: _col0
Statistics: Num rows: 1 Data size: 89 Basic stats:
COMPLETE Column stats: COMPLETE
Reduce Output Operator
key expressions: _col0 (type: string)
null sort order: z
sort order: +
Map-reduce partition columns: _col0 (type: string)
Statistics: Num rows: 1 Data size: 89 Basic stats:
COMPLETE Column stats: COMPLETE
Select Operator
Statistics: Num rows: 1 Data size: 4 Basic stats:
COMPLETE Column stats: COMPLETE
Group By Operator
keys: true (type: boolean)
minReductionHashAggr: 0.4
mode: hash
outputColumnNames: _col0
Statistics: Num rows: 1 Data size: 4 Basic stats:
COMPLETE Column stats: COMPLETE
Reduce Output Operator
key expressions: _col0 (type: boolean)
null sort order: z
sort order: +
Map-reduce partition columns: _col0 (type: boolean)
Statistics: Num rows: 1 Data size: 4 Basic stats:
COMPLETE Column stats: COMPLETE
Execution mode: vectorized, llap
LLAP IO: all inputs
Reducer 2
Execution mode: vectorized, llap
Reduce Operator Tree:
Group By Operator
keys: KEY._col0 (type: string)
mode: mergepartial
outputColumnNames: _col0
Statistics: Num rows: 1 Data size: 89 Basic stats: COMPLETE
Column stats: COMPLETE
Select Operator
Statistics: Num rows: 1 Data size: 89 Basic stats:
COMPLETE Column stats: COMPLETE
Group By Operator
keys: true (type: boolean)
minReductionHashAggr: 0.4
mode: hash
outputColumnNames: _col0
Statistics: Num rows: 1 Data size: 4 Basic stats:
COMPLETE Column stats: COMPLETE
Reduce Output Operator
key expressions: _col0 (type: boolean)
null sort order: z
sort order: +
Map-reduce partition columns: _col0 (type: boolean)
Statistics: Num rows: 1 Data size: 4 Basic stats:
COMPLETE Column stats: COMPLETE
Reducer 3
Execution mode: vectorized, llap
Reduce Operator Tree:
Group By Operator
keys: KEY._col0 (type: boolean)
mode: mergepartial
outputColumnNames: _col0
Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE
Column stats: COMPLETE
Select Operator
Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE
Column stats: COMPLETE
Reduce Output Operator
null sort order:
sort order:
Statistics: Num rows: 1 Data size: 8 Basic stats:
COMPLETE Column stats: COMPLETE
Reducer 4
Execution mode: llap
Reduce Operator Tree:
Merge Join Operator
condition map:
Inner Join 0 to 1
keys:
0
1
Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE
Column stats: COMPLETE
Select Operator
expressions: 1 (type: int)
outputColumnNames: _col0
Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE
Column stats: COMPLETE
File Output Operator
compressed: false
Statistics: Num rows: 1 Data size: 4 Basic stats:
COMPLETE Column stats: COMPLETE
table:
input format:
org.apache.hadoop.mapred.SequenceFileInputFormat
output format:
org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
serde:
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
Reducer 5
Execution mode: vectorized, llap
Reduce Operator Tree:
Group By Operator
keys: KEY._col0 (type: boolean)
mode: mergepartial
outputColumnNames: _col0
Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE
Column stats: COMPLETE
Select Operator
Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE
Column stats: COMPLETE
Reduce Output Operator
null sort order:
sort order:
Statistics: Num rows: 1 Data size: 8 Basic stats:
COMPLETE Column stats: COMPLETE
Stage: Stage-0
Fetch Operator
limit: -1
Processor Tree:
ListSink
```
See Reducer 3 and 5 operator tree
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]