[jira] [Commented] (CALCITE-3830) The ‘approximate’ field should be considered when computing the digest of AggregateCall

Shuo Cheng (Jira) Thu, 27 Feb 2020 03:12:28 -0800


    [ 
https://issues.apache.org/jira/browse/CALCITE-3830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17046503#comment-17046503
 ]


Shuo Cheng commented on CALCITE-3830:
-------------------------------------

Hi [~julianhyde], I'm considering where the keyword APPROXIMATE should be put 
for AggregateCall.toString(), I've come up with some options:

For APPROX_COUNT_DISTINCT(xx).toString()
 # APPROX COUNT(DISTINCT xx)
 # COUNT(APPROX:DISTINCT xx)
 # COUNT(DISTINCT xx) APPROX

What do  you think?

Btw, APPROX_COUNT_DISTINCT is translated into COUNT + APPROX + DISTINCT during 
sql-to-rel phase currently, and AFAIK approximate = true is only valid for 
APPROX_COUNT_DISTINCT, so the original pr may be also feasible?

> The ‘approximate’ field should be considered when computing the digest of 
> AggregateCall
> ---------------------------------------------------------------------------------------
>
>                 Key: CALCITE-3830
>                 URL: https://issues.apache.org/jira/browse/CALCITE-3830
>             Project: Calcite
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 1.21.0
>            Reporter: Shuo Cheng
>            Assignee: Shuo Cheng
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 1.22.0
>
>          Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> In planner optimization, the digest  of Aggregate node contains digest of its 
> AggregateCall, i.e. AggregateCall.toString, but currently the 'approximate' 
> filed of AggregateCall is not considered in toString() method, which may 
> leads to the situation two different relNodes are considered as identical in 
> planner optimizing phase. 
> Here is an example:
> {code:java}
> // SQL
> select * from (
>   select a, count(distinct b) from T group by a
>   union all
>   select a, approx_count_distinct(b) from T group by a
> )
> // after applying a rule, the plan is
> LogicalSink(name=[_DataStreamTable_1], fields=[a, EXPR$1], __id__=[96])
> +- LogicalProject(a=[$0], EXPR$1=[$1], __id__=[94])
>    +- LogicalUnion(all=[true], __id__=[92])
>       :- LogicalAggregate(group=[{0}], EXPR$1=[COUNT(DISTINCT $1)], 
> __id__=[89])
>       :  +- LogicalTableScan(table=[[default, _DataStreamTable_2]], 
> __id__=[100])
>       +- LogicalAggregate(group=[{0}], EXPR$1=[COUNT(DISTINCT $1)], 
> __id__=[89])
>          +- LogicalTableScan(table=[[default, _DataStreamTable_2]], 
> __id__=[100])
> {code}
> As showing in the example, after optimizing, these two Aggregates are 
> considered as identical (both with 89 as relNode ID).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (CALCITE-3830) The ‘approximate’ field should be considered when computing the digest of AggregateCall

Reply via email to