zhaolong created HIVE-29174:
-------------------------------
Summary: count (distinct) from subquery DISTRIBUTE BY sort return
error result
Key: HIVE-29174
URL: https://issues.apache.org/jira/browse/HIVE-29174
Project: Hive
Issue Type: Bug
Affects Versions: 4.0.1, 4.1.0, 3.1.0
Reporter: zhaolong
Attachments: image-2025-09-02-19-55-01-845.png
create table zyj0715(shoujihaoma string ,msisdn_2 string,user_name
string,certificate_code string);
insert into zyj0715 values ('13920150169','10100000',null,null);
insert into zyj0715 values ('13920157788','10100000',null,null);
insert into zyj0715 values ('13920157788','10100000',null,null);
insert into zyj0715 values ('13920150169','10100000',null,null);
Expected Result:
2
Actual Results:
3
ReduceSinkOp should be sorted based on the _col1, _col2, _col3,_col0, field.
Actually, only _col1, _col2, and _col3 are included. As a result, data is not
sorted on the Reduce side, and the return result of count(distinct) is
incorrect.
explain:
!image-2025-09-02-19-55-01-845.png!
--
This message was sent by Atlassian Jira
(v8.20.10#820010)