[ 
https://issues.apache.org/jira/browse/HIVE-29174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitriy Fingerman updated HIVE-29174:
-------------------------------------
    Labels: correctness  (was: )

> count (distinct) from subquery DISTRIBUTE BY sort return error result
> ---------------------------------------------------------------------
>
>                 Key: HIVE-29174
>                 URL: https://issues.apache.org/jira/browse/HIVE-29174
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 3.1.0, 4.1.0, 4.0.1
>            Reporter: zhaolong
>            Priority: Critical
>              Labels: correctness
>         Attachments: image-2025-09-02-19-55-01-845.png
>
>
> create table zyj0715(shoujihaoma string ,msisdn_2 string,user_name 
> string,certificate_code string);
>  
> insert into zyj0715 values ('13920150169','10100000',null,null);
> insert into zyj0715 values ('13920157788','10100000',null,null);
> insert into zyj0715 values ('13920157788','10100000',null,null);
> insert into zyj0715 values ('13920150169','10100000',null,null);
> select count (distinct shoujihaoma) FROM(select * from zyj0715 DISTRIBUTE BY 
> msisdn_2, user_name,certificate_code SORT BY shoujihaoma asc)t GROUP BY 
> msisdn_2,user_name ,certificate_code;
>  
> Expected Result:
> 2
>  
> Actual Results:
> 3
>  
> ReduceSinkOp should be sorted based on the _col1, _col2, _col3,_col0, field. 
> Actually, only _col1, _col2, and _col3 are included. As a result, data is not 
> sorted on the Reduce side, and the return result of count(distinct) is 
> incorrect.
>  
> explain:
> !image-2025-09-02-19-55-01-845.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to