[
https://issues.apache.org/jira/browse/CALCITE-3830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17046503#comment-17046503
]
Shuo Cheng commented on CALCITE-3830:
-------------------------------------
Hi [~julianhyde], I'm considering where the keyword APPROXIMATE should be put
for AggregateCall.toString(), I've come up with some options:
For APPROX_COUNT_DISTINCT(xx).toString()
# APPROX COUNT(DISTINCT xx)
# COUNT(APPROX:DISTINCT xx)
# COUNT(DISTINCT xx) APPROX
What do you think?
Btw, APPROX_COUNT_DISTINCT is translated into COUNT + APPROX + DISTINCT during
sql-to-rel phase currently, and AFAIK approximate = true is only valid for
APPROX_COUNT_DISTINCT, so the original pr may be also feasible?
> The ‘approximate’ field should be considered when computing the digest of
> AggregateCall
> ---------------------------------------------------------------------------------------
>
> Key: CALCITE-3830
> URL: https://issues.apache.org/jira/browse/CALCITE-3830
> Project: Calcite
> Issue Type: Bug
> Components: core
> Affects Versions: 1.21.0
> Reporter: Shuo Cheng
> Assignee: Shuo Cheng
> Priority: Major
> Labels: pull-request-available
> Fix For: 1.22.0
>
> Time Spent: 1h 10m
> Remaining Estimate: 0h
>
> In planner optimization, the digest of Aggregate node contains digest of its
> AggregateCall, i.e. AggregateCall.toString, but currently the 'approximate'
> filed of AggregateCall is not considered in toString() method, which may
> leads to the situation two different relNodes are considered as identical in
> planner optimizing phase.
> Here is an example:
> {code:java}
> // SQL
> select * from (
> select a, count(distinct b) from T group by a
> union all
> select a, approx_count_distinct(b) from T group by a
> )
> // after applying a rule, the plan is
> LogicalSink(name=[_DataStreamTable_1], fields=[a, EXPR$1], __id__=[96])
> +- LogicalProject(a=[$0], EXPR$1=[$1], __id__=[94])
> +- LogicalUnion(all=[true], __id__=[92])
> :- LogicalAggregate(group=[{0}], EXPR$1=[COUNT(DISTINCT $1)],
> __id__=[89])
> : +- LogicalTableScan(table=[[default, _DataStreamTable_2]],
> __id__=[100])
> +- LogicalAggregate(group=[{0}], EXPR$1=[COUNT(DISTINCT $1)],
> __id__=[89])
> +- LogicalTableScan(table=[[default, _DataStreamTable_2]],
> __id__=[100])
> {code}
> As showing in the example, after optimizing, these two Aggregates are
> considered as identical (both with 89 as relNode ID).
--
This message was sent by Atlassian Jira
(v8.3.4#803005)