Hari Sankar Sivarama Subramaniyan created HIVE-12647:
--------------------------------------------------------

             Summary: hive.mapred.mode=strict throws an error even if the final 
plan does not have cartesian product in it.
                 Key: HIVE-12647
                 URL: https://issues.apache.org/jira/browse/HIVE-12647
             Project: Hive
          Issue Type: Bug
            Reporter: Hari Sankar Sivarama Subramaniyan


{code}
Vertex dependency in root stage
Reducer 10 <- Reducer 9 (SIMPLE_EDGE)
Reducer 2 <- Map 1 (SIMPLE_EDGE), Map 11 (SIMPLE_EDGE)
Reducer 3 <- Map 12 (SIMPLE_EDGE), Reducer 2 (SIMPLE_EDGE)
Reducer 4 <- Map 13 (SIMPLE_EDGE), Reducer 3 (SIMPLE_EDGE)
Reducer 5 <- Map 14 (SIMPLE_EDGE), Reducer 4 (SIMPLE_EDGE)
Reducer 6 <- Map 15 (SIMPLE_EDGE), Reducer 5 (SIMPLE_EDGE)
Reducer 7 <- Map 16 (SIMPLE_EDGE), Reducer 6 (SIMPLE_EDGE)
Reducer 8 <- Map 17 (SIMPLE_EDGE), Reducer 7 (SIMPLE_EDGE)
Reducer 9 <- Reducer 8 (SIMPLE_EDGE)

Stage-0
   Fetch Operator
      limit:100
      Stage-1
         Reducer 10
         File Output Operator [FS_63]
            compressed:false
            Statistics:Num rows: 100 Data size: 143600 Basic stats: COMPLETE 
Column stats: NONE
            table:{"input 
format:":"org.apache.hadoop.mapred.TextInputFormat","output 
format:":"org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat","serde:":"org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe"}
            Limit [LIM_62]
               Number of rows:100
               Statistics:Num rows: 100 Data size: 143600 Basic stats: COMPLETE 
Column stats: NONE
               Select Operator [SEL_61]
               |  
outputColumnNames:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8","_col9","_col10","_col11","_col12","_col13","_col14"]
               |  Statistics:Num rows: 127050 Data size: 182479129 Basic stats: 
COMPLETE Column stats: NONE
               |<-Reducer 9 [SIMPLE_EDGE]
                  Reduce Output Operator [RS_60]
                     key expressions:_col0 (type: string), _col1 (type: 
string), _col2 (type: string)
                     sort order:+++
                     Statistics:Num rows: 127050 Data size: 182479129 Basic 
stats: COMPLETE Column stats: NONE
                     value expressions:_col3 (type: bigint), _col4 (type: 
double), _col5 (type: double), _col6 (type: double), _col7 (type: bigint), 
_col8 (type: double), _col9 (type: double), _col10 (type: double), _col11 
(type: bigint), _col12 (type: double), _col13 (type: double)
                     Select Operator [SEL_58]
                        
outputColumnNames:["_col0","_col1","_col10","_col11","_col12","_col13","_col2","_col3","_col4","_col5","_col6","_col7","_col8","_col9"]
                        Statistics:Num rows: 127050 Data size: 182479129 Basic 
stats: COMPLETE Column stats: NONE
                        Group By Operator [GBY_57]
                        |  
aggregations:["count(VALUE._col0)","avg(VALUE._col1)","stddev_samp(VALUE._col2)","count(VALUE._col3)","avg(VALUE._col4)","stddev_samp(VALUE._col5)","count(VALUE._col6)","avg(VALUE._col7)","stddev_samp(VALUE._col8)"]
                        |  keys:KEY._col0 (type: string), KEY._col1 (type: 
string), KEY._col2 (type: string)
                        |  
outputColumnNames:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8","_col9","_col10","_col11"]
                        |  Statistics:Num rows: 127050 Data size: 182479129 
Basic stats: COMPLETE Column stats: NONE
                        |<-Reducer 8 [SIMPLE_EDGE]
                           Reduce Output Operator [RS_56]
                              key expressions:_col0 (type: string), _col1 
(type: string), _col2 (type: string)
                              Map-reduce partition columns:_col0 (type: 
string), _col1 (type: string), _col2 (type: string)
                              sort order:+++
                              Statistics:Num rows: 254100 Data size: 364958258 
Basic stats: COMPLETE Column stats: NONE
                              value expressions:_col3 (type: bigint), _col4 
(type: struct<count:bigint,sum:double,input:int>), _col5 (type: 
struct<count:bigint,sum:double,variance:double>), _col6 (type: bigint), _col7 
(type: struct<count:bigint,sum:double,input:int>), _col8 (type: 
struct<count:bigint,sum:double,variance:double>), _col9 (type: bigint), _col10 
(type: struct<count:bigint,sum:double,input:int>), _col11 (type: 
struct<count:bigint,sum:double,variance:double>)
                              Group By Operator [GBY_55]
                                 
aggregations:["count(_col5)","avg(_col5)","stddev_samp(_col5)","count(_col10)","avg(_col10)","stddev_samp(_col10)","count(_col14)","avg(_col14)","stddev_samp(_col14)"]
                                 keys:_col22 (type: string), _col24 (type: 
string), _col25 (type: string)
                                 
outputColumnNames:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8","_col9","_col10","_col11"]
                                 Statistics:Num rows: 254100 Data size: 
364958258 Basic stats: COMPLETE Column stats: NONE
                                 Select Operator [SEL_54]
                                    
outputColumnNames:["_col22","_col24","_col25","_col5","_col10","_col14"]
                                    Statistics:Num rows: 254100 Data size: 
364958258 Basic stats: COMPLETE Column stats: NONE
                                    Merge Join Operator [MERGEJOIN_113]
                                    |  condition map:[{"":"Inner Join 0 to 1"}]
                                    |  keys:{"0":"_col1 (type: int)","1":"_col0 
(type: int)"}
                                    |  
outputColumnNames:["_col5","_col10","_col14","_col22","_col24","_col25"]
                                    |  Statistics:Num rows: 254100 Data size: 
364958258 Basic stats: COMPLETE Column stats: NONE
                                    |<-Map 17 [SIMPLE_EDGE]
                                    |  Reduce Output Operator [RS_52]
                                    |     key expressions:_col0 (type: int)
                                    |     Map-reduce partition columns:_col0 
(type: int)
                                    |     sort order:+
                                    |     Statistics:Num rows: 231000 Data 
size: 331780228 Basic stats: COMPLETE Column stats: NONE
                                    |     value expressions:_col1 (type: 
string), _col2 (type: string)
                                    |     Select Operator [SEL_18]
                                    |        
outputColumnNames:["_col0","_col1","_col2"]
                                    |        Statistics:Num rows: 231000 Data 
size: 331780228 Basic stats: COMPLETE Column stats: NONE
                                    |        Filter Operator [FIL_106]
                                    |           predicate:i_item_sk is not null 
(type: boolean)
                                    |           Statistics:Num rows: 231000 
Data size: 331780228 Basic stats: COMPLETE Column stats: NONE
                                    |           TableScan [TS_17]
                                    |              alias:item
                                    |              Statistics:Num rows: 462000 
Data size: 663560457 Basic stats: COMPLETE Column stats: NONE
                                    |<-Reducer 7 [SIMPLE_EDGE]
                                       Reduce Output Operator [RS_50]
                                          key expressions:_col1 (type: int)
                                          Map-reduce partition columns:_col1 
(type: int)
                                          sort order:+
                                          Statistics:Num rows: 26735 Data size: 
29919145 Basic stats: COMPLETE Column stats: NONE
                                          value expressions:_col5 (type: int), 
_col10 (type: int), _col14 (type: int), _col22 (type: string)
                                          Merge Join Operator [MERGEJOIN_112]
                                          |  condition map:[{"":"Inner Join 0 
to 1"}]
                                          |  keys:{"0":"_col3 (type: 
int)","1":"_col0 (type: int)"}
                                          |  
outputColumnNames:["_col1","_col5","_col10","_col14","_col22"]
                                          |  Statistics:Num rows: 26735 Data 
size: 29919145 Basic stats: COMPLETE Column stats: NONE
                                          |<-Map 16 [SIMPLE_EDGE]
                                          |  Reduce Output Operator [RS_47]
                                          |     key expressions:_col0 (type: 
int)
                                          |     Map-reduce partition 
columns:_col0 (type: int)
                                          |     sort order:+
                                          |     Statistics:Num rows: 852 Data 
size: 1628138 Basic stats: COMPLETE Column stats: NONE
                                          |     value expressions:_col1 (type: 
string)
                                          |     Select Operator [SEL_16]
                                          |        
outputColumnNames:["_col0","_col1"]
                                          |        Statistics:Num rows: 852 
Data size: 1628138 Basic stats: COMPLETE Column stats: NONE
                                          |        Filter Operator [FIL_105]
                                          |           predicate:s_store_sk is 
not null (type: boolean)
                                          |           Statistics:Num rows: 852 
Data size: 1628138 Basic stats: COMPLETE Column stats: NONE
                                          |           TableScan [TS_15]
                                          |              alias:store
                                          |              Statistics:Num rows: 
1704 Data size: 3256276 Basic stats: COMPLETE Column stats: NONE
                                          |<-Reducer 6 [SIMPLE_EDGE]
                                             Reduce Output Operator [RS_45]
                                                key expressions:_col3 (type: 
int)
                                                Map-reduce partition 
columns:_col3 (type: int)
                                                sort order:+
                                                Statistics:Num rows: 24305 Data 
size: 27199223 Basic stats: COMPLETE Column stats: NONE
                                                value expressions:_col1 (type: 
int), _col5 (type: int), _col10 (type: int), _col14 (type: int)
                                                Merge Join Operator 
[MERGEJOIN_111]
                                                |  condition map:[{"":"Inner 
Join 0 to 1"}]
                                                |  keys:{"0":"_col11 (type: 
int)","1":"_col0 (type: int)"}
                                                |  
outputColumnNames:["_col1","_col3","_col5","_col10","_col14"]
                                                |  Statistics:Num rows: 24305 
Data size: 27199223 Basic stats: COMPLETE Column stats: NONE
                                                |<-Map 15 [SIMPLE_EDGE]
                                                |  Reduce Output Operator 
[RS_42]
                                                |     key expressions:_col0 
(type: int)
                                                |     Map-reduce partition 
columns:_col0 (type: int)
                                                |     sort order:+
                                                |     Statistics:Num rows: 
18262 Data size: 20435178 Basic stats: COMPLETE Column stats: NONE
                                                |     Select Operator [SEL_14]
                                                |        
outputColumnNames:["_col0"]
                                                |        Statistics:Num rows: 
18262 Data size: 20435178 Basic stats: COMPLETE Column stats: NONE
                                                |        Filter Operator 
[FIL_104]
                                                |           
predicate:((d_quarter_name) IN ('2000Q1', '2000Q2', '2000Q3') and d_date_sk is 
not null) (type: boolean)
                                                |           Statistics:Num 
rows: 18262 Data size: 20435178 Basic stats: COMPLETE Column stats: NONE
                                                |           TableScan [TS_12]
                                                |              alias:d1
                                                |              Statistics:Num 
rows: 73049 Data size: 81741831 Basic stats: COMPLETE Column stats: NONE
                                                |<-Reducer 5 [SIMPLE_EDGE]
                                                   Reduce Output Operator 
[RS_40]
                                                      key expressions:_col11 
(type: int)
                                                      Map-reduce partition 
columns:_col11 (type: int)
                                                      sort order:+
                                                      Statistics:Num rows: 
22096 Data size: 24726566 Basic stats: COMPLETE Column stats: NONE
                                                      value expressions:_col1 
(type: int), _col3 (type: int), _col5 (type: int), _col10 (type: int), _col14 
(type: int)
                                                      Merge Join Operator 
[MERGEJOIN_110]
                                                      |  condition 
map:[{"":"Inner Join 0 to 1"}]
                                                      |  keys:{"0":"_col6 
(type: int)","1":"_col0 (type: int)"}
                                                      |  
outputColumnNames:["_col1","_col3","_col5","_col10","_col11","_col14"]
                                                      |  Statistics:Num rows: 
22096 Data size: 24726566 Basic stats: COMPLETE Column stats: NONE
                                                      |<-Map 14 [SIMPLE_EDGE]
                                                      |  Reduce Output Operator 
[RS_37]
                                                      |     key 
expressions:_col0 (type: int)
                                                      |     Map-reduce 
partition columns:_col0 (type: int)
                                                      |     sort order:+
                                                      |     Statistics:Num 
rows: 18262 Data size: 20435178 Basic stats: COMPLETE Column stats: NONE
                                                      |     Select Operator 
[SEL_11]
                                                      |        
outputColumnNames:["_col0"]
                                                      |        Statistics:Num 
rows: 18262 Data size: 20435178 Basic stats: COMPLETE Column stats: NONE
                                                      |        Filter Operator 
[FIL_103]
                                                      |           
predicate:((d_quarter_name) IN ('2000Q1', '2000Q2', '2000Q3') and d_date_sk is 
not null) (type: boolean)
                                                      |           
Statistics:Num rows: 18262 Data size: 20435178 Basic stats: COMPLETE Column 
stats: NONE
                                                      |           TableScan 
[TS_9]
                                                      |              alias:d1
                                                      |              
Statistics:Num rows: 73049 Data size: 81741831 Basic stats: COMPLETE Column 
stats: NONE
                                                      |<-Reducer 4 [SIMPLE_EDGE]
                                                         Reduce Output Operator 
[RS_35]
                                                            key 
expressions:_col6 (type: int)
                                                            Map-reduce 
partition columns:_col6 (type: int)
                                                            sort order:+
                                                            Statistics:Num 
rows: 20088 Data size: 22478696 Basic stats: COMPLETE Column stats: NONE
                                                            value 
expressions:_col1 (type: int), _col3 (type: int), _col5 (type: int), _col10 
(type: int), _col11 (type: int), _col14 (type: int)
                                                            Merge Join Operator 
[MERGEJOIN_109]
                                                            |  condition 
map:[{"":"Inner Join 0 to 1"}]
                                                            |  keys:{"0":"_col0 
(type: int)","1":"_col0 (type: int)"}
                                                            |  
outputColumnNames:["_col1","_col3","_col5","_col6","_col10","_col11","_col14"]
                                                            |  Statistics:Num 
rows: 20088 Data size: 22478696 Basic stats: COMPLETE Column stats: NONE
                                                            |<-Map 13 
[SIMPLE_EDGE]
                                                            |  Reduce Output 
Operator [RS_32]
                                                            |     key 
expressions:_col0 (type: int)
                                                            |     Map-reduce 
partition columns:_col0 (type: int)
                                                            |     sort order:+
                                                            |     
Statistics:Num rows: 18262 Data size: 20435178 Basic stats: COMPLETE Column 
stats: NONE
                                                            |     Select 
Operator [SEL_8]
                                                            |        
outputColumnNames:["_col0"]
                                                            |        
Statistics:Num rows: 18262 Data size: 20435178 Basic stats: COMPLETE Column 
stats: NONE
                                                            |        Filter 
Operator [FIL_102]
                                                            |           
predicate:((d_quarter_name = '2000Q1') and d_date_sk is not null) (type: 
boolean)
                                                            |           
Statistics:Num rows: 18262 Data size: 20435178 Basic stats: COMPLETE Column 
stats: NONE
                                                            |           
TableScan [TS_6]
                                                            |              
alias:d1
                                                            |              
Statistics:Num rows: 73049 Data size: 81741831 Basic stats: COMPLETE Column 
stats: NONE
                                                            |<-Reducer 3 
[SIMPLE_EDGE]
                                                               Reduce Output 
Operator [RS_30]
                                                                  key 
expressions:_col0 (type: int)
                                                                  Map-reduce 
partition columns:_col0 (type: int)
                                                                  sort order:+
                                                                  
Statistics:Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
                                                                  value 
expressions:_col1 (type: int), _col3 (type: int), _col5 (type: int), _col6 
(type: int), _col10 (type: int), _col11 (type: int), _col14 (type: int)
                                                                  Merge Join 
Operator [MERGEJOIN_108]
                                                                  |  condition 
map:[{"":"Inner Join 0 to 1"}]
                                                                  |  
keys:{"0":"_col8 (type: int), _col7 (type: int)","1":"_col1 (type: int), _col2 
(type: int)"}
                                                                  |  
outputColumnNames:["_col0","_col1","_col3","_col5","_col6","_col10","_col11","_col14"]
                                                                  |  
Statistics:Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
                                                                  |<-Map 12 
[SIMPLE_EDGE]
                                                                  |  Reduce 
Output Operator [RS_27]
                                                                  |     key 
expressions:_col1 (type: int), _col2 (type: int)
                                                                  |     
Map-reduce partition columns:_col1 (type: int), _col2 (type: int)
                                                                  |     sort 
order:++
                                                                  |     
Statistics:Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
                                                                  |     value 
expressions:_col0 (type: int), _col3 (type: int)
                                                                  |     Select 
Operator [SEL_5]
                                                                  |        
outputColumnNames:["_col0","_col1","_col2","_col3"]
                                                                  |        
Statistics:Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
                                                                  |        
Filter Operator [FIL_101]
                                                                  |           
predicate:((cs_bill_customer_sk is not null and cs_item_sk is not null) and 
cs_sold_date_sk is not null) (type: boolean)
                                                                  |           
Statistics:Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
                                                                  |           
TableScan [TS_4]
                                                                  |             
 alias:catalog_sales
                                                                  |             
 Statistics:Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
                                                                  |<-Reducer 2 
[SIMPLE_EDGE]
                                                                     Reduce 
Output Operator [RS_25]
                                                                        key 
expressions:_col8 (type: int), _col7 (type: int)
                                                                        
Map-reduce partition columns:_col8 (type: int), _col7 (type: int)
                                                                        sort 
order:++
                                                                        
Statistics:Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
                                                                        value 
expressions:_col0 (type: int), _col1 (type: int), _col3 (type: int), _col5 
(type: int), _col6 (type: int), _col10 (type: int)
                                                                        Merge 
Join Operator [MERGEJOIN_107]
                                                                        |  
condition map:[{"":"Inner Join 0 to 1"}]
                                                                        |  
keys:{"0":"_col2 (type: int), _col1 (type: int), _col4 (type: int)","1":"_col2 
(type: int), _col1 (type: int), _col3 (type: int)"}
                                                                        |  
outputColumnNames:["_col0","_col1","_col3","_col5","_col6","_col7","_col8","_col10"]
                                                                        |  
Statistics:Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
                                                                        |<-Map 
1 [SIMPLE_EDGE]
                                                                        |  
Reduce Output Operator [RS_20]
                                                                        |     
key expressions:_col2 (type: int), _col1 (type: int), _col4 (type: int)
                                                                        |     
Map-reduce partition columns:_col2 (type: int), _col1 (type: int), _col4 (type: 
int)
                                                                        |     
sort order:+++
                                                                        |     
Statistics:Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
                                                                        |     
value expressions:_col0 (type: int), _col3 (type: int), _col5 (type: int)
                                                                        |     
Select Operator [SEL_1]
                                                                        |       
 outputColumnNames:["_col0","_col1","_col2","_col3","_col4","_col5"]
                                                                        |       
 Statistics:Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
                                                                        |       
 Filter Operator [FIL_99]
                                                                        |       
    predicate:((((ss_customer_sk is not null and ss_item_sk is not null) and 
ss_ticket_number is not null) and ss_sold_date_sk is not null) and ss_store_sk 
is not null) (type: boolean)
                                                                        |       
    Statistics:Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
                                                                        |       
    TableScan [TS_0]
                                                                        |       
       alias:store_sales
                                                                        |       
       Statistics:Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: 
NONE
                                                                        |<-Map 
11 [SIMPLE_EDGE]
                                                                           
Reduce Output Operator [RS_22]
                                                                              
key expressions:_col2 (type: int), _col1 (type: int), _col3 (type: int)
                                                                              
Map-reduce partition columns:_col2 (type: int), _col1 (type: int), _col3 (type: 
int)
                                                                              
sort order:+++
                                                                              
Statistics:Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
                                                                              
value expressions:_col0 (type: int), _col4 (type: int)
                                                                              
Select Operator [SEL_3]
                                                                                
 outputColumnNames:["_col0","_col1","_col2","_col3","_col4"]
                                                                                
 Statistics:Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
                                                                                
 Filter Operator [FIL_100]
                                                                                
    predicate:(((sr_customer_sk is not null and sr_item_sk is not null) and 
sr_ticket_number is not null) and sr_returned_date_sk is not null) (type: 
boolean)
                                                                                
    Statistics:Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
                                                                                
    TableScan [TS_2]
                                                                                
       alias:store_returns
                                                                                
       Statistics:Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: 
NONE
{code}

The query is :
{code}
 explain select i_item_desc ,i_category ,i_class ,i_current_price ,i_item_id 
,sum(ws_ext_sales_price) as itemrevenue 
,sum(ws_ext_sales_price)*100/sum(sum(ws_ext_sales_price)) over (partition by 
i_class) as revenueratio from web_sales ,item ,date_dim where 
web_sales.ws_item_sk = item.i_item_sk and item.i_category in ('Jewelry', 
'Sports', 'Books') and web_sales.ws_sold_date_sk = date_dim.d_date_sk and 
date_dim.d_date between '2001-01-12' and '2001-02-11' group by i_item_id 
,i_item_desc ,i_category ,i_class ,i_current_price order by i_category ,i_class 
,i_item_id ,i_item_desc ,revenueratio limit 100;
{code}

It seems that in SemanticAnalyzer.genJoinReduceSinkChild() we look for Join 
predicates only in 'ON' clause. If the join condition happens in 'WHERE' clause 
of the query, we aggressively throw an exception assuming this join is a 
cartesian product in strict mode. We should delay this check post physical 
optimizer until the plan is complete.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to