Re: [PR] HIVE-27788: Exception when join has 2 Group By operators in the same branch in the same reducer [hive]

via GitHub Tue, 21 Nov 2023 05:08:09 -0800


kasakrisz commented on code in PR #4864:
URL: https://github.com/apache/hive/pull/4864#discussion_r1400574210



##########
ql/src/test/queries/clientpositive/auto_sortmerge_join_17.q:
##########
@@ -0,0 +1,20 @@
+CREATE TABLE tbl1_n5(key int, value string) CLUSTERED BY (key) SORTED BY (key) 
INTO 2 BUCKETS;
+
+insert into tbl1_n5(key, value)
+values
+(0, 'val_0'),
+(2, 'val_2'),
+(9, 'val_9');
+
+explain
+SELECT t1.key from
+(SELECT  key , row_number() over(partition by key order by value desc) as rk 
from tbl1_n5) t1
+join
+( SELECT key,count(distinct value) as cp_count from tbl1_n5 group by key) t2
+on t1.key = t2.key where rk = 1;
+
+SELECT t1.key from
+(SELECT  key , row_number() over(partition by key order by value desc) as rk 
from tbl1_n5) t1

Review Comment:
   In that case SMB conversion can not be applyed because parent RS operators 
doesn't have sort keys:
   ```
   POSTHOOK: query: explain
   SELECT t1.rk from
   (SELECT key rk FROM tbl1_n5 GROUP BY key) t1
   join
   ( SELECT key,count(distinct value) as cp_count from tbl1_n5 group by key) t2
   on t1.rk = t2.key where rk = '1'
   POSTHOOK: type: QUERY
   POSTHOOK: Input: default@tbl1_n5
   #### A masked pattern was here ####
   STAGE DEPENDENCIES:
     Stage-1 is a root stage
     Stage-0 depends on stages: Stage-1
   
   STAGE PLANS:
     Stage: Stage-1
       Tez
   #### A masked pattern was here ####
         Edges:
           Reducer 2 <- Map 1 (SIMPLE_EDGE)
           Reducer 3 <- Reducer 2 (SIMPLE_EDGE)
           Reducer 4 <- Reducer 3 (XPROD_EDGE), Reducer 5 (XPROD_EDGE)
           Reducer 5 <- Map 1 (SIMPLE_EDGE)
   #### A masked pattern was here ####
         Vertices:
           Map 1 
               Map Operator Tree:
                   TableScan
                     alias: tbl1_n5
                     filterExpr: (key = 1) (type: boolean)
                     Statistics: Num rows: 3 Data size: 279 Basic stats: 
COMPLETE Column stats: COMPLETE
                     Filter Operator
                       predicate: (key = 1) (type: boolean)
                       Statistics: Num rows: 1 Data size: 93 Basic stats: 
COMPLETE Column stats: COMPLETE
                       Select Operator
                         expressions: value (type: string)
                         outputColumnNames: value
                         Statistics: Num rows: 1 Data size: 93 Basic stats: 
COMPLETE Column stats: COMPLETE
                         Group By Operator
                           keys: value (type: string)
                           minReductionHashAggr: 0.4
                           mode: hash
                           outputColumnNames: _col0
                           Statistics: Num rows: 1 Data size: 89 Basic stats: 
COMPLETE Column stats: COMPLETE
                           Reduce Output Operator
                             key expressions: _col0 (type: string)
                             null sort order: z
                             sort order: +
                             Map-reduce partition columns: _col0 (type: string)
                             Statistics: Num rows: 1 Data size: 89 Basic stats: 
COMPLETE Column stats: COMPLETE
                       Select Operator
                         Statistics: Num rows: 1 Data size: 4 Basic stats: 
COMPLETE Column stats: COMPLETE
                         Group By Operator
                           keys: true (type: boolean)
                           minReductionHashAggr: 0.4
                           mode: hash
                           outputColumnNames: _col0
                           Statistics: Num rows: 1 Data size: 4 Basic stats: 
COMPLETE Column stats: COMPLETE
                           Reduce Output Operator
                             key expressions: _col0 (type: boolean)
                             null sort order: z
                             sort order: +
                             Map-reduce partition columns: _col0 (type: boolean)
                             Statistics: Num rows: 1 Data size: 4 Basic stats: 
COMPLETE Column stats: COMPLETE
               Execution mode: vectorized, llap
               LLAP IO: all inputs
           Reducer 2 
               Execution mode: vectorized, llap
               Reduce Operator Tree:
                 Group By Operator
                   keys: KEY._col0 (type: string)
                   mode: mergepartial
                   outputColumnNames: _col0
                   Statistics: Num rows: 1 Data size: 89 Basic stats: COMPLETE 
Column stats: COMPLETE
                   Select Operator
                     Statistics: Num rows: 1 Data size: 89 Basic stats: 
COMPLETE Column stats: COMPLETE
                     Group By Operator
                       keys: true (type: boolean)
                       minReductionHashAggr: 0.4
                       mode: hash
                       outputColumnNames: _col0
                       Statistics: Num rows: 1 Data size: 4 Basic stats: 
COMPLETE Column stats: COMPLETE
                       Reduce Output Operator
                         key expressions: _col0 (type: boolean)
                         null sort order: z
                         sort order: +
                         Map-reduce partition columns: _col0 (type: boolean)
                         Statistics: Num rows: 1 Data size: 4 Basic stats: 
COMPLETE Column stats: COMPLETE
           Reducer 3 
               Execution mode: vectorized, llap
               Reduce Operator Tree:
                 Group By Operator
                   keys: KEY._col0 (type: boolean)
                   mode: mergepartial
                   outputColumnNames: _col0
                   Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE 
Column stats: COMPLETE
                   Select Operator
                     Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE 
Column stats: COMPLETE
                     Reduce Output Operator
                       null sort order: 
                       sort order: 
                       Statistics: Num rows: 1 Data size: 8 Basic stats: 
COMPLETE Column stats: COMPLETE
           Reducer 4 
               Execution mode: llap
               Reduce Operator Tree:
                 Merge Join Operator
                   condition map:
                        Inner Join 0 to 1
                   keys:
                     0 
                     1 
                   Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE 
Column stats: COMPLETE
                   Select Operator
                     expressions: 1 (type: int)
                     outputColumnNames: _col0
                     Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE 
Column stats: COMPLETE
                     File Output Operator
                       compressed: false
                       Statistics: Num rows: 1 Data size: 4 Basic stats: 
COMPLETE Column stats: COMPLETE
                       table:
                           input format: 
org.apache.hadoop.mapred.SequenceFileInputFormat
                           output format: 
org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
                           serde: 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
           Reducer 5 
               Execution mode: vectorized, llap
               Reduce Operator Tree:
                 Group By Operator
                   keys: KEY._col0 (type: boolean)
                   mode: mergepartial
                   outputColumnNames: _col0
                   Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE 
Column stats: COMPLETE
                   Select Operator
                     Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE 
Column stats: COMPLETE
                     Reduce Output Operator
                       null sort order: 
                       sort order: 
                       Statistics: Num rows: 1 Data size: 8 Basic stats: 
COMPLETE Column stats: COMPLETE
   
     Stage: Stage-0
       Fetch Operator
         limit: -1
         Processor Tree:
           ListSink
   ```
   See Reducer 3 and 5 operator tree



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] HIVE-27788: Exception when join has 2 Group By operators in the same branch in the same reducer [hive]

Reply via email to