[ https://issues.apache.org/jira/browse/HIVE-24712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
liuyan updated HIVE-24712: -------------------------- Description: When Both param set to false , seems the result is not correct, a query that should return 200 rows but now only returns 35 rows. This is tested on HDP 3.1.5 set hive.map.aggr=false; set hive.optimize.reducededuplication=false; select cs_sold_date_sk,count(distinct cs_order_number) from tpcds_orc.catalog_sales_orc group by cs_sold_date_sk order by cs_sold_date_sk limit 200; ---------------------------------------------------------------------------------------------- VERTICES MODE STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED ---------------------------------------------------------------------------------------------- Map 1 .......... llap SUCCEEDED 33 33 0 0 0 0 Reducer 2 ...... llap SUCCEEDED 4 4 0 0 0 0 Reducer 3 ...... llap SUCCEEDED 4 4 0 0 0 0 Reducer 4 ...... llap SUCCEEDED 1 1 0 0 0 0 ---------------------------------------------------------------------------------------------- VERTICES: 04/04 [==========================>>] 100% ELAPSED TIME: 38.23 s ---------------------------------------------------------------------------------------------- FO : INFO : Task Execution Summary INFO : ---------------------------------------------------------------------------------------------- INFO : VERTICES DURATION(ms) CPU_TIME(ms) GC_TIME(ms) INPUT_RECORDS OUTPUT_RECORDS INFO : ---------------------------------------------------------------------------------------------- INFO : Map 1 38097.00 0 0 143,997,065 57,447 INFO : Reducer 2 9003.00 0 0 57,447 13,108 INFO : Reducer 3 0.00 0 0 13,108 35 INFO : Reducer 4 0.00 0 0 35 0 INFO : ---------------------------------------------------------------------------------------------- INFO : INFO : LLAP IO Summary set hive.map.aggr=true; set hive.optimize.reducededuplication=false; select cs_sold_date_sk,count(distinct cs_order_number) from tpcds_orc.catalog_sales_orc group by cs_sold_date_sk order by cs_sold_date_sk limit 200; ---------------------------------------------------------------------------------------------- VERTICES MODE STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED ---------------------------------------------------------------------------------------------- Map 1 .......... llap SUCCEEDED 33 33 0 0 0 0 Reducer 2 ...... llap SUCCEEDED 4 4 0 0 0 0 Reducer 3 ...... llap SUCCEEDED 2 2 0 0 0 0 Reducer 4 ...... llap SUCCEEDED 1 1 0 0 0 0 ---------------------------------------------------------------------------------------------- VERTICES: 04/04 [==========================>>] 100% ELAPSED TIME: 36.24 s ---------------------------------------------------------------------------------------------- INFO : ---------------------------------------------------------------------------------------------- INFO : VERTICES DURATION(ms) CPU_TIME(ms) GC_TIME(ms) INPUT_RECORDS OUTPUT_RECORDS INFO : ---------------------------------------------------------------------------------------------- INFO : Map 1 25595.00 0 0 143,997,065 16,703,757 INFO : Reducer 2 18556.00 0 0 16,703,757 800 INFO : Reducer 3 8018.00 0 0 800 200 INFO : Reducer 4 0.00 0 0 200 0 INFO : ---------------------------------------------------------------------------------------------- INFO : was: When Both param set to false , seems the result is not correct, only 35 rows. This is tested on HDP 3.1.5 ---------------------------------------------------------------------------------------------- VERTICES MODE STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED ---------------------------------------------------------------------------------------------- Map 1 .......... llap SUCCEEDED 33 33 0 0 0 0 Reducer 2 ...... llap SUCCEEDED 4 4 0 0 0 0 Reducer 3 ...... llap SUCCEEDED 4 4 0 0 0 0 Reducer 4 ...... llap SUCCEEDED 1 1 0 0 0 0 ---------------------------------------------------------------------------------------------- VERTICES: 04/04 [==========================>>] 100% ELAPSED TIME: 38.23 s ---------------------------------------------------------------------------------------------- FO : INFO : Task Execution Summary INFO : ---------------------------------------------------------------------------------------------- INFO : VERTICES DURATION(ms) CPU_TIME(ms) GC_TIME(ms) INPUT_RECORDS OUTPUT_RECORDS INFO : ---------------------------------------------------------------------------------------------- INFO : Map 1 38097.00 0 0 143,997,065 57,447 INFO : Reducer 2 9003.00 0 0 57,447 13,108 INFO : Reducer 3 0.00 0 0 13,108 35 INFO : Reducer 4 0.00 0 0 35 0 INFO : ---------------------------------------------------------------------------------------------- INFO : INFO : LLAP IO Summary set hive.map.aggr=true; set hive.optimize.reducededuplication=false; select cs_sold_date_sk,count(distinct cs_order_number) from tpcds_orc.catalog_sales_orc group by cs_sold_date_sk order by cs_sold_date_sk limit 200; ---------------------------------------------------------------------------------------------- VERTICES MODE STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED ---------------------------------------------------------------------------------------------- Map 1 .......... llap SUCCEEDED 33 33 0 0 0 0 Reducer 2 ...... llap SUCCEEDED 4 4 0 0 0 0 Reducer 3 ...... llap SUCCEEDED 2 2 0 0 0 0 Reducer 4 ...... llap SUCCEEDED 1 1 0 0 0 0 ---------------------------------------------------------------------------------------------- VERTICES: 04/04 [==========================>>] 100% ELAPSED TIME: 36.24 s ---------------------------------------------------------------------------------------------- INFO : ---------------------------------------------------------------------------------------------- INFO : VERTICES DURATION(ms) CPU_TIME(ms) GC_TIME(ms) INPUT_RECORDS OUTPUT_RECORDS INFO : ---------------------------------------------------------------------------------------------- INFO : Map 1 25595.00 0 0 143,997,065 16,703,757 INFO : Reducer 2 18556.00 0 0 16,703,757 800 INFO : Reducer 3 8018.00 0 0 800 200 INFO : Reducer 4 0.00 0 0 200 0 INFO : ---------------------------------------------------------------------------------------------- INFO : > hive.map.aggr=false and hive.optimize.reducededuplication=false provide > incorrect result on order by with limit > --------------------------------------------------------------------------------------------------------------- > > Key: HIVE-24712 > URL: https://issues.apache.org/jira/browse/HIVE-24712 > Project: Hive > Issue Type: Improvement > Components: CBO > Affects Versions: 3.1.0 > Reporter: liuyan > Priority: Critical > > When Both param set to false , seems the result is not correct, a query that > should return 200 rows but now only returns 35 rows. This is tested on HDP > 3.1.5 > set hive.map.aggr=false; > set hive.optimize.reducededuplication=false; > select cs_sold_date_sk,count(distinct cs_order_number) from > tpcds_orc.catalog_sales_orc group by cs_sold_date_sk order by > cs_sold_date_sk limit 200; > ---------------------------------------------------------------------------------------------- > VERTICES MODE STATUS TOTAL COMPLETED RUNNING PENDING > FAILED KILLED > ---------------------------------------------------------------------------------------------- > Map 1 .......... llap SUCCEEDED 33 33 0 0 > 0 0 > Reducer 2 ...... llap SUCCEEDED 4 4 0 0 > 0 0 > Reducer 3 ...... llap SUCCEEDED 4 4 0 0 > 0 0 > Reducer 4 ...... llap SUCCEEDED 1 1 0 0 > 0 0 > ---------------------------------------------------------------------------------------------- > VERTICES: 04/04 [==========================>>] 100% ELAPSED TIME: 38.23 s > > ---------------------------------------------------------------------------------------------- > FO : > INFO : Task Execution Summary > INFO : > ---------------------------------------------------------------------------------------------- > INFO : VERTICES DURATION(ms) CPU_TIME(ms) GC_TIME(ms) > INPUT_RECORDS OUTPUT_RECORDS > INFO : > ---------------------------------------------------------------------------------------------- > INFO : Map 1 38097.00 0 0 > 143,997,065 57,447 > INFO : Reducer 2 9003.00 0 0 > 57,447 13,108 > INFO : Reducer 3 0.00 0 0 > 13,108 35 > INFO : Reducer 4 0.00 0 0 > 35 0 > INFO : > ---------------------------------------------------------------------------------------------- > INFO : > INFO : LLAP IO Summary > > > set hive.map.aggr=true; > set hive.optimize.reducededuplication=false; > select cs_sold_date_sk,count(distinct cs_order_number) from > tpcds_orc.catalog_sales_orc group by cs_sold_date_sk order by > cs_sold_date_sk limit 200; > ---------------------------------------------------------------------------------------------- > VERTICES MODE STATUS TOTAL COMPLETED RUNNING PENDING > FAILED KILLED > ---------------------------------------------------------------------------------------------- > Map 1 .......... llap SUCCEEDED 33 33 0 0 > 0 0 > Reducer 2 ...... llap SUCCEEDED 4 4 0 0 > 0 0 > Reducer 3 ...... llap SUCCEEDED 2 2 0 0 > 0 0 > Reducer 4 ...... llap SUCCEEDED 1 1 0 0 > 0 0 > ---------------------------------------------------------------------------------------------- > VERTICES: 04/04 [==========================>>] 100% ELAPSED TIME: 36.24 s > > ---------------------------------------------------------------------------------------------- > INFO : > ---------------------------------------------------------------------------------------------- > INFO : VERTICES DURATION(ms) CPU_TIME(ms) GC_TIME(ms) > INPUT_RECORDS OUTPUT_RECORDS > INFO : > ---------------------------------------------------------------------------------------------- > INFO : Map 1 25595.00 0 0 > 143,997,065 16,703,757 > INFO : Reducer 2 18556.00 0 0 > 16,703,757 800 > INFO : Reducer 3 8018.00 0 0 > 800 200 > INFO : Reducer 4 0.00 0 0 > 200 0 > INFO : > ---------------------------------------------------------------------------------------------- > INFO : -- This message was sent by Atlassian Jira (v8.3.4#803005)