[
https://issues.apache.org/jira/browse/HIVE-25498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Krisztian Kasa resolved HIVE-25498.
-----------------------------------
Resolution: Fixed
Pushed to master. Thanks [~robbiezhang] for the fix and [~pgaref] for review.
> Query with more than 31 count distinct functions returns wrong result
> ---------------------------------------------------------------------
>
> Key: HIVE-25498
> URL: https://issues.apache.org/jira/browse/HIVE-25498
> Project: Hive
> Issue Type: Bug
> Components: CBO
> Reporter: Robbie Zhang
> Assignee: Robbie Zhang
> Priority: Major
> Labels: pull-request-available
> Fix For: 4.0.0
>
> Time Spent: 40m
> Remaining Estimate: 0h
>
> If there are more than 32 "COUNT(DISTINCT COL)" functions in a query, some or
> even all these COUNT functions in this query return 0 instead of the proper
> values.
> Here are the queries to reproduce this issue:
> {code:java}
> set hive.cbo.enable=true;
> create table test_count (c0 string, c1 string, c2 string, c3 string, c4
> string, c5 string, c6 string, c7 string, c8 string, c9 string, c10 string,
> c11 string, c12 string, c13 string, c14 string, c15 string, c16 string, c17
> string, c18 string, c19 string, c20 string, c21 string, c22 string, c23
> string, c24 string, c25 string, c26 string, c27 string, c28 string, c29
> string, c30 string, c31 string, c32 string);
> INSERT INTO test_count values ('c0', 'c1', 'c2', 'c3', 'c4', 'c5', 'c6',
> 'c7', 'c8', 'c9', 'c10', 'c11', 'c12', 'c13', 'c14', 'c15', 'c16', 'c17',
> 'c18', 'c19', 'c20', 'c21', 'c22', 'c23', 'c24', 'c25', 'c26', 'c27', 'c28',
> 'c29', 'c30', 'c31', 'c32');
> select count (distinct c0), count(distinct c1), count(distinct c2),
> count(distinct c3), count(distinct c4), count(distinct c5), count(distinct
> c6), count(distinct c7), count(distinct c8), count(distinct c9),
> count(distinct c10), count(distinct c11), count(distinct c12), count(distinct
> c13), count(distinct c14), count(distinct c15), count(distinct c16),
> count(distinct c17), count(distinct c18), count(distinct c19), count(distinct
> c20), count(distinct c21), count(distinct c22), count(distinct c23),
> count(distinct c24), count(distinct c25), count(distinct c26), count(distinct
> c27), count(distinct c28), count(distinct c29), count(distinct c30),
> count(distinct c31), count(distinct c32) from test_count;
> {code}
> This bug is caused by HiveExpandDistinctAggregatesRule.getGroupingIdValue()
> which uses int type. When there are more than 32 groupings the values
> overflow.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)