qingfa zhou created HIVE-22181: ---------------------------------- Summary: Same query but different result occasionally Key: HIVE-22181 URL: https://issues.apache.org/jira/browse/HIVE-22181 Project: Hive Issue Type: Bug Components: Hive Affects Versions: 2.3.0 Environment: hadoop:2.7.3.2.6.1.0-129
hive:2.3.0 tez:0.8.4 java:1.8.0_212 Reporter: qingfa zhou Attachments: WX20190910-123604.png, WX20190910-123621.png, WX20190910-123641.png h3. 1.Phenomenon Running the same SQL(hive on tez) several times will result in inconsistent results . The following results are generated respectively 1) OK 3951864 808 2) OK 1822979 353 But the first result is true. h3. 2.SQL select count(1),count(distinct store_code) from ( select store_code, all_pay_id, payable_price, row_number()over(partition by store_code order by payable_price desc) as rw, count(1)over(partition by store_code) as store_user from ( select store_code, all_pay_id, sum(payable_price) as payable_price from data_promotion.mdw_user_promotion_shopping_behaivor_four_week_detail a where dt='20190904' group by store_code,all_pay_id ) a order by rw ) a where rw/store_user<=0.8 ; h3. 3.Troubleshooting I lookup task result from tez web ui,give the result as follows: the first: !image-2019-09-10-12-16-45-385.png! the second: !image-2019-09-10-12-18-08-769.png! so , The second RECORDS_OUT_INTERMEDIATE_Reducer_4's result is much lower than the first. This is detailed info of the second result from reducer_3 to reducer_4 !image-2019-09-10-12-26-41-681.png! This is execution plan of the sql: OK Vertex dependency in root stage Reducer 2 <- Map 1 (SIMPLE_EDGE) Reducer 3 <- Reducer 2 (SIMPLE_EDGE) Reducer 4 <- Reducer 3 (SIMPLE_EDGE) Reducer 5 <- Reducer 4 (SIMPLE_EDGE) Reducer 6 <- Reducer 5 (SIMPLE_EDGE) Stage-0 Fetch Operator limit:-1 Stage-1 Reducer 6 File Output Operator [FS_24] Group By Operator [GBY_22] (rows=1 width=128) Output:["_col0","_col1"],aggregations:["count(VALUE._col0)","count(DISTINCT KEY._col0:0._col0)"] <-Reducer 5 [SIMPLE_EDGE] SHUFFLE [RS_21] Group By Operator [GBY_20] (rows=5619870 width=2328) Output:["_col0","_col1","_col2"],aggregations:["count(1)","count(DISTINCT _col0)"],keys:_col0 Select Operator [SEL_17] (rows=5619870 width=2328) Output:["_col0"] <-Reducer 4 [SIMPLE_EDGE] SHUFFLE [RS_16] Select Operator [SEL_14] (rows=5619870 width=2328) Output:["_col0","_col3"] Filter Operator [FIL_25] (rows=5619870 width=2328) predicate:((_col0 / count_window_1) <= 0.8) PTF Operator [PTF_13] (rows=16859610 width=2328) Function definitions:[{},\{"name:":"windowingtablefunction","order by:":"_col1 ASC NULLS FIRST","partition by:":"_col1"}] Select Operator [SEL_12] (rows=16859610 width=2328) Output:["_col0","_col1"] <-Reducer 3 [SIMPLE_EDGE] SHUFFLE [RS_11] PartitionCols:_col0 Select Operator [SEL_10] (rows=16859610 width=2328) Output:["_col0","row_number_window_0"] PTF Operator [PTF_9] (rows=16859610 width=2328) Function definitions:[{},\{"name:":"windowingtablefunction","order by:":"_col2 DESC NULLS LAST","partition by:":"_col0"}] Select Operator [SEL_8] (rows=16859610 width=2328) Output:["_col0","_col2"] <-Reducer 2 [SIMPLE_EDGE] SHUFFLE [RS_7] PartitionCols:_col0 Select Operator [SEL_6] (rows=16859610 width=2328) Output:["_col0","_col2"] Group By Operator [GBY_5] (rows=16859610 width=2328) Output:["_col0","_col1","_col2"],aggregations:["sum(VALUE._col0)"],keys:KEY._col0, KEY._col1 <-Map 1 [SIMPLE_EDGE] SHUFFLE [RS_4] PartitionCols:_col0, _col1 Group By Operator [GBY_3] (rows=33719220 width=2328) Output:["_col0","_col1","_col2"],aggregations:["sum(payable_price)"],keys:store_code, all_pay_id Select Operator [SEL_2] (rows=33719220 width=2328) Output:["store_code","all_pay_id","payable_price"] TableScan [TS_0] (rows=33719220 width=2328) data_promotion@mdw_user_promotion_shopping_behaivor_four_week_detail,a,Tbl:COMPLETE,Col:NONE,Output:["all_pay_id","store_code","payable_price"] Reducer 4 show PTF , so I don't know if this result has anything to do with PTF? Can you help me. -- This message was sent by Atlassian Jira (v8.3.2#803003)