[
https://issues.apache.org/jira/browse/CALCITE-6492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17867918#comment-17867918
]
Julian Hyde commented on CALCITE-6492:
--------------------------------------
There are two aspects of an aggregate function that are being conflated here.
One is its syntax (is the user allowed to write DISTINCT? ) and the other is
its algebraic properties (does it always give the same results if the duplicate
values are eliminated?).
The syntax aspect drives the behavior of the validator; the algebraic
properties drive the optimizer. An effort to add new algebraic properties would
be useful. Some more examples of algebraic properties:
*
[singleton|https://calcite.apache.org/javadocAggregate/org/apache/calcite/sql/SqlSingletonAggFunction.html]
is able to generate an expression for a single row
*
[splittable|https://calcite.apache.org/javadocAggregate/org/apache/calcite/sql/SqlSplittableAggFunction.html]
can generate an expression to compensate for double-counting
* whether an aggregate function ignores NULL values (e.g. ARRAY_AGG keeps
nulls)
* whether order of input values is important (e.g. ARRAY_AGG, LISTAGG)
* whether the aggregate function returns NULL when input is empty (e.g. COUNT
returns 0)
* whether the aggregate function commutes, SUM({SUM({a, b}), SUM({c, d, e})})
* if the function doesn't commute, does it have a roll-up function? E.g. COUNT
rolls up using SUM.
Another example of syntax vs algebra. RESPECT NULLS and IGNORE NULLS syntax
only make a difference for aggregate functions (such as ARRAY_AGG) that do not
ignore null values.
> Support aggregate functions which could process DISTINCT natively
> -----------------------------------------------------------------
>
> Key: CALCITE-6492
> URL: https://issues.apache.org/jira/browse/CALCITE-6492
> Project: Calcite
> Issue Type: Improvement
> Reporter: Zoltan Haindrich
> Assignee: Zoltan Haindrich
> Priority: Major
>
> This could be usefull if the execution engine natively supports some distinct
> aggregations natively - there is no rewrite necessary for these functions.
> Currently there is support
> [SqlAggFunction#getDistinctOptionality|https://github.com/apache/calcite/blob/0deab6f7e0cb4ec63eae8b59477d6f0fadfd11e8/core/src/main/java/org/apache/calcite/sql/SqlAggFunction.java#L187-L189]
> - which have overlaps with this - possibly the closest would be to set it to
> *IGNORED* if its supported natively...however
> * that's a bit misleading as its not IGNORED; but supported...
> * there is also
> [checkArgument|https://github.com/apache/calcite/blob/0deab6f7e0cb4ec63eae8b59477d6f0fadfd11e8/core/src/main/java/org/apache/calcite/rel/core/AggregateCall.java#L125]
> which ensures that *distinct* is not accepted in tht case.
> More or less the end result would be to also enhance
> AggregateExpandDistinctAggregatesRule with the ability to ignore aggregates.
> note: In Druid
> * if approximationCountDistinct is disabled ; that [enables a calcite rule
> which rewrites *all* disitnct
> aggregates|https://github.com/apache/druid/blob/master/sql/src/main/java/org/apache/druid/sql/calcite/planner/CalciteRulesManager.java#L496-L503]
> * in the meantime there are also some aggregate functions which support
> *distinct* natively like
> [string_agg|https://github.com/apache/druid/blob/c9aae9d8e683c0cc9c4687e526b8270f744c57c2/sql/src/main/java/org/apache/druid/sql/calcite/aggregation/builtin/StringSqlAggregator.java#L154]
> - which doesn't need any rewrites
--
This message was sent by Atlassian Jira
(v8.20.10#820010)