liuyan created HIVE-24712:
-----------------------------

             Summary: hive.map.aggr=false and 
hive.optimize.reducededuplication=false provide incorrect result on order by 
with limit
                 Key: HIVE-24712
                 URL: https://issues.apache.org/jira/browse/HIVE-24712
             Project: Hive
          Issue Type: Improvement
          Components: CBO
    Affects Versions: 3.1.0
            Reporter: liuyan


 When Both param set to false , seems the result is not correct, only 35 rows. 
This is tested on HDP 3.1.5

set hive.map.aggr=false;
set hive.optimize.reducededuplication=false;

select cs_sold_date_sk,count(distinct cs_order_number) from 
tpcds_orc.catalog_sales_orc group by cs_sold_date_sk order by cs_sold_date_sk 
limit 200;

----------------------------------------------------------------------------------------------
 VERTICES MODE STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED 
----------------------------------------------------------------------------------------------
Map 1 .......... llap SUCCEEDED 33 33 0 0 0 0 
Reducer 2 ...... llap SUCCEEDED 4 4 0 0 0 0 
Reducer 3 ...... llap SUCCEEDED 4 4 0 0 0 0 
Reducer 4 ...... llap SUCCEEDED 1 1 0 0 0 0 
----------------------------------------------------------------------------------------------
VERTICES: 04/04 [==========================>>] 100% ELAPSED TIME: 38.23 s 
----------------------------------------------------------------------------------------------
FO : 
INFO : Task Execution Summary
INFO : 
----------------------------------------------------------------------------------------------
INFO : VERTICES DURATION(ms) CPU_TIME(ms) GC_TIME(ms) INPUT_RECORDS 
OUTPUT_RECORDS
INFO : 
----------------------------------------------------------------------------------------------
INFO : Map 1 38097.00 0 0 143,997,065 57,447
INFO : Reducer 2 9003.00 0 0 57,447 13,108
INFO : Reducer 3 0.00 0 0 13,108 35
INFO : Reducer 4 0.00 0 0 35 0
INFO : 
----------------------------------------------------------------------------------------------
INFO : 
INFO : LLAP IO Summary

 

 


set hive.map.aggr=true;
set hive.optimize.reducededuplication=false;

select cs_sold_date_sk,count(distinct cs_order_number) from 
tpcds_orc.catalog_sales_orc group by cs_sold_date_sk order by cs_sold_date_sk 
limit 200;
----------------------------------------------------------------------------------------------
 VERTICES MODE STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED 
----------------------------------------------------------------------------------------------
Map 1 .......... llap SUCCEEDED 33 33 0 0 0 0 
Reducer 2 ...... llap SUCCEEDED 4 4 0 0 0 0 
Reducer 3 ...... llap SUCCEEDED 2 2 0 0 0 0 
Reducer 4 ...... llap SUCCEEDED 1 1 0 0 0 0 
----------------------------------------------------------------------------------------------
VERTICES: 04/04 [==========================>>] 100% ELAPSED TIME: 36.24 s 
----------------------------------------------------------------------------------------------


INFO : 
----------------------------------------------------------------------------------------------
INFO : VERTICES DURATION(ms) CPU_TIME(ms) GC_TIME(ms) INPUT_RECORDS 
OUTPUT_RECORDS
INFO : 
----------------------------------------------------------------------------------------------
INFO : Map 1 25595.00 0 0 143,997,065 16,703,757
INFO : Reducer 2 18556.00 0 0 16,703,757 800
INFO : Reducer 3 8018.00 0 0 800 200
INFO : Reducer 4 0.00 0 0 200 0
INFO : 
----------------------------------------------------------------------------------------------
INFO :



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to