[jira] [Commented] (CALCITE-6492) Support aggregate functions which could process DISTINCT natively

Julian Hyde (Jira) Mon, 22 Jul 2024 18:47:05 -0700


    [ 
https://issues.apache.org/jira/browse/CALCITE-6492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17867918#comment-17867918
 ]


Julian Hyde commented on CALCITE-6492:
--------------------------------------

There are two aspects of an aggregate function that are being conflated here. 
One is its syntax (is the user allowed to write DISTINCT? ) and the other is 
its algebraic properties (does it always give the same results if the duplicate 
values are eliminated?).

The syntax aspect drives the behavior of the validator; the algebraic 
properties drive the optimizer. An effort to add new algebraic properties would 
be useful. Some more examples of algebraic properties:
 * 
[singleton|https://calcite.apache.org/javadocAggregate/org/apache/calcite/sql/SqlSingletonAggFunction.html]
 is able to generate an expression for a single row
 * 
[splittable|https://calcite.apache.org/javadocAggregate/org/apache/calcite/sql/SqlSplittableAggFunction.html]
 can generate an expression to compensate for double-counting
 * whether an aggregate function ignores NULL values (e.g. ARRAY_AGG keeps 
nulls)
 * whether order of input values is important (e.g. ARRAY_AGG, LISTAGG)
 * whether the aggregate function returns NULL when input is empty (e.g. COUNT 
returns 0)
 * whether the aggregate function commutes, SUM({SUM({a, b}), SUM({c, d, e})})
 * if the function doesn't commute, does it have a roll-up function? E.g. COUNT 
rolls up using SUM.

Another example of syntax vs algebra. RESPECT NULLS and IGNORE NULLS syntax 
only make a difference for aggregate functions (such as ARRAY_AGG) that do not 
ignore null values.


> Support aggregate functions which could process DISTINCT natively
> -----------------------------------------------------------------
>
>                 Key: CALCITE-6492
>                 URL: https://issues.apache.org/jira/browse/CALCITE-6492
>             Project: Calcite
>          Issue Type: Improvement
>            Reporter: Zoltan Haindrich
>            Assignee: Zoltan Haindrich
>            Priority: Major
>
> This could be usefull if the execution engine natively supports some distinct 
> aggregations natively - there is no rewrite necessary for these functions.
> Currently there is support 
> [SqlAggFunction#getDistinctOptionality|https://github.com/apache/calcite/blob/0deab6f7e0cb4ec63eae8b59477d6f0fadfd11e8/core/src/main/java/org/apache/calcite/sql/SqlAggFunction.java#L187-L189]
>  - which have overlaps with this - possibly the closest would be to set it to 
> *IGNORED* if its supported natively...however
> * that's a bit misleading as its not IGNORED; but supported...
> * there is also 
> [checkArgument|https://github.com/apache/calcite/blob/0deab6f7e0cb4ec63eae8b59477d6f0fadfd11e8/core/src/main/java/org/apache/calcite/rel/core/AggregateCall.java#L125]
>  which ensures that *distinct* is not accepted in tht case.
> More or less the end result would be to also enhance 
> AggregateExpandDistinctAggregatesRule with the ability to ignore aggregates.
> note: In Druid
> * if approximationCountDistinct is disabled ; that [enables a calcite rule 
> which rewrites *all* disitnct 
> aggregates|https://github.com/apache/druid/blob/master/sql/src/main/java/org/apache/druid/sql/calcite/planner/CalciteRulesManager.java#L496-L503]
> * in the meantime there are also some aggregate functions which support 
> *distinct* natively like 
> [string_agg|https://github.com/apache/druid/blob/c9aae9d8e683c0cc9c4687e526b8270f744c57c2/sql/src/main/java/org/apache/druid/sql/calcite/aggregation/builtin/StringSqlAggregator.java#L154]
>  - which doesn't need any rewrites



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (CALCITE-6492) Support aggregate functions which could process DISTINCT natively

Reply via email to