zhaolong created HIVE-29174:
-------------------------------

             Summary: count (distinct) from subquery DISTRIBUTE BY sort return 
error result
                 Key: HIVE-29174
                 URL: https://issues.apache.org/jira/browse/HIVE-29174
             Project: Hive
          Issue Type: Bug
    Affects Versions: 4.0.1, 4.1.0, 3.1.0
            Reporter: zhaolong
         Attachments: image-2025-09-02-19-55-01-845.png

create table zyj0715(shoujihaoma string ,msisdn_2 string,user_name 
string,certificate_code string);
 
insert into zyj0715 values ('13920150169','10100000',null,null);
insert into zyj0715 values ('13920157788','10100000',null,null);
insert into zyj0715 values ('13920157788','10100000',null,null);
insert into zyj0715 values ('13920150169','10100000',null,null);
 
Expected Result:
2
 
Actual Results:
3
 
ReduceSinkOp should be sorted based on the _col1, _col2, _col3,_col0, field. 
Actually, only _col1, _col2, and _col3 are included. As a result, data is not 
sorted on the Reduce side, and the return result of count(distinct) is 
incorrect.
 
explain:
!image-2025-09-02-19-55-01-845.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to