[jira] [Commented] (HIVE-29174) count (distinct) from subquery DISTRIBUTE BY sort return error result

Krisztian Kasa (Jira) Tue, 04 Nov 2025 02:41:47 -0800


    [ 
https://issues.apache.org/jira/browse/HIVE-29174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18035294#comment-18035294
 ]


Krisztian Kasa commented on HIVE-29174:
---------------------------------------

I copied the repro steps to a q file and built Hive from the latest master 
branch. I was not able to repro the issue.
I also got the same plan mentioned in the description, it shows that RS has 
already the necessary sort keys:
{code}
                    Reduce Output Operator
                      key expressions: _col1 (type: string), _col2 (type: 
string), _col3 (type: string), _col0 (type: string)
                      null sort order: zzzz
                      sort order: ++++
                      Map-reduce partition columns: _col1 (type: string), _col2 
(type: string), _col3 (type: string)
{code}

I also run the repro steps using the Hive 4.1.0 docker image and I was not able 
to repro the issue.


> count (distinct) from subquery DISTRIBUTE BY sort return error result
> ---------------------------------------------------------------------
>
>                 Key: HIVE-29174
>                 URL: https://issues.apache.org/jira/browse/HIVE-29174
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 3.1.0, 4.1.0, 4.0.1
>            Reporter: zhaolong
>            Priority: Critical
>              Labels: correctness
>         Attachments: image-2025-09-02-19-55-01-845.png
>
>
> create table zyj0715(shoujihaoma string ,msisdn_2 string,user_name 
> string,certificate_code string);
>  
> insert into zyj0715 values ('13920150169','10100000',null,null);
> insert into zyj0715 values ('13920157788','10100000',null,null);
> insert into zyj0715 values ('13920157788','10100000',null,null);
> insert into zyj0715 values ('13920150169','10100000',null,null);
> select count (distinct shoujihaoma) FROM(select * from zyj0715 DISTRIBUTE BY 
> msisdn_2, user_name,certificate_code SORT BY shoujihaoma asc)t GROUP BY 
> msisdn_2,user_name ,certificate_code;
>  
> Expected Result:
> 2
>  
> Actual Results:
> 3
>  
> ReduceSinkOp should be sorted based on the _col1, _col2, _col3,_col0, field. 
> Actually, only _col1, _col2, and _col3 are included. As a result, data is not 
> sorted on the Reduce side, and the return result of count(distinct) is 
> incorrect.
>  
> explain:
> !image-2025-09-02-19-55-01-845.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (HIVE-29174) count (distinct) from subquery DISTRIBUTE BY sort return error result

Reply via email to