[
https://issues.apache.org/jira/browse/HIVE-28254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Zhihua Deng updated HIVE-28254:
-------------------------------
Labels: hive-4.0.1-merged hive-4.0.1-must pull-request-available (was:
hive-4.0.1-must pull-request-available)
> CBO (Calcite Return Path): Multiple DISTINCT leads to wrong results
> -------------------------------------------------------------------
>
> Key: HIVE-28254
> URL: https://issues.apache.org/jira/browse/HIVE-28254
> Project: Hive
> Issue Type: Sub-task
> Components: CBO
> Affects Versions: 4.0.0
> Reporter: Shohei Okumiya
> Assignee: Shohei Okumiya
> Priority: Major
> Labels: hive-4.0.1-merged, hive-4.0.1-must,
> pull-request-available
> Fix For: 4.1.0
>
>
> CBO return path can build incorrect GroupByOperator when multiple
> aggregations with DISTINCT are involved.
> This is an example.
> {code:java}
> CREATE TABLE test (col1 INT, col2 INT);
> INSERT INTO test VALUES (1, 100), (2, 200), (2, 200), (3, 300);
> set hive.cbo.returnpath.hiveop=true;
> set hive.map.aggr=false;
> SELECT
> SUM(DISTINCT col1),
> COUNT(DISTINCT col1),
> SUM(DISTINCT col2),
> SUM(col2)
> FROM test;{code}
> The last column should be 800. But the SUM refers to col1 and the actual
> result is 8.
> {code:java}
> +------+------+------+------+
> | _c0 | _c1 | _c2 | _c3 |
> +------+------+------+------+
> | 6 | 3 | 600 | 8 |
> +------+------+------+------+ {code}
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)