[jira] [Resolved] (CALCITE-5158) count(1) with subquery count(distinct) gives wrong results with hive.optimize.distinct.rewrite=true and cbo on

Alessandro Solimando (Jira) Wed, 18 May 2022 01:34:07 -0700


     [ 
https://issues.apache.org/jira/browse/CALCITE-5158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Alessandro Solimando resolved CALCITE-5158.
-------------------------------------------
    Resolution: Invalid

> count(1) with subquery count(distinct) gives wrong results with 
> hive.optimize.distinct.rewrite=true and cbo on
> --------------------------------------------------------------------------------------------------------------
>
>                 Key: CALCITE-5158
>                 URL: https://issues.apache.org/jira/browse/CALCITE-5158
>             Project: Calcite
>          Issue Type: Bug
>    Affects Versions: 1.19.0
>            Reporter: honghui.Liu
>            Priority: Major
>
> {code:java}
> create table count_distinct(a int, b int);
> insert into table count_distinct values (1,2),(2,3);
> set hive.execution.engine=tez;
> set hive.cbo.enable=true;
> set hive.optimize.distinct.rewrite=true;
> select count(1) from ( 
>       select count(distinct a) from count_distinct
> ) tmp; {code}
> it give wrong result when hive.optimize.distinct.rewrite is true, By default, 
> it's true for all 3.x versions. The test result is 2, and the expected result 
> is 1.
> Before CBO optimization，RelNode tree as this，
> {code:java}
> HiveProject(_o__c0=[$0])
>   HiveAggregate(group=[{}], agg#0=[count($0)])
>     HiveProject($f0=[1])
>       HiveProject(_o__c0=[$0])
>         HiveAggregate(group=[{}], agg#0=[count(DISTINCT $0)])
>           HiveProject($f0=[$0])
>             HiveTableScan(table=[[default.count_distinct]], 
> table:alias=[count_distinct]) {code}
> Optimized by HiveExpandDistinctAggregatesRule, RelNode tree as this，
> {code:java}
> HiveProject(_o__c0=[$0])
>   HiveAggregate(group=[{}], agg#0=[count($0)])
>     HiveProject($f0=[1])
>       HiveProject(_o__c0=[$0])
>         HiveAggregate(group=[{}], agg#0=[count($0)])
>           HiveAggregate(group=[{0}])
>             HiveProject($f0=[$0])
>               HiveProject($f0=[$0])
>                 HiveTableScan(table=[[default.count_distinct]], 
> table:alias=[count_distinct]) {code}
> count(distinct xx) converte to count (xx) from (select xx from table_name 
> group by xx) 
> Optimized by Projection Pruning, RelNode tree as this, 
> {code:java}
> HiveAggregate(group=[{}], agg#0=[count()])
>   HiveProject(DUMMY=[0])
>     HiveAggregate(group=[{}])
>       HiveAggregate(group=[{0}])
>         HiveProject(a=[$0])
>           HiveTableScan(table=[[default.count_distinct]], 
> table:alias=[count_distinct]) {code}
> In this case, an error occurs in the execution plan.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Resolved] (CALCITE-5158) count(1) with subquery count(distinct) gives wrong results with hive.optimize.distinct.rewrite=true and cbo on

Reply via email to