[jira] [Commented] (CALCITE-5801) Document that Aggregate grouping keys are emitted from lowest to highest index

Julian Hyde (Jira) Mon, 26 Jun 2023 18:45:27 -0700


    [ 
https://issues.apache.org/jira/browse/CALCITE-5801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17737410#comment-17737410
 ]


Julian Hyde commented on CALCITE-5801:
--------------------------------------

I agree that this is a rather subtle aspect of Calcite's behavior and would 
benefit from better documentation.

(It is a necessary for Calcite to work this way. "\{{GROUP BY a, b}}", 
"\{{GROUP BY b, a}}" and "\{{GROUP BY a, b, a}}" all have identical effect in 
SQL. Calcite's representation of these should be, and is, identical.)

For a while I have been working on CALCITE-5802, a feature request to make it 
easier to translate complex aggregate expressions such as "1 + SUM(x + 2)". It 
is somewhat related to your problem, in that it deals with expressions that are 
too complex to express in a single relational operation, and in particular 
deals with the {{SELECT}} and {{GROUP BY}} clauses simultaneously.

> Document that Aggregate grouping keys are emitted from lowest to highest index
> ------------------------------------------------------------------------------
>
>                 Key: CALCITE-5801
>                 URL: https://issues.apache.org/jira/browse/CALCITE-5801
>             Project: Calcite
>          Issue Type: Improvement
>          Components: core
>    Affects Versions: 1.34.0
>            Reporter: Chris Rice
>            Priority: Minor
>
> Consider query:
> {code}
> SELECT b, a FROM t GROUP BY b, a
> {code}
> where the schema for table {{t}} emits {{a}} in column 0 and {{b}} in column 
> 1.
> You might be tempted to build this relation using:
> {code}
> relBuilder.scan("t");
> var groupKey = relBuilder.groupKey(1, 0);
> var aggregate = relBuilder.aggregate(groupKey).build();
> {code}
> but this will emit a plan which emits `a, b` rather than `b, a`:
> {code}
> LogicalAggregate(group=[{0, 1}])
>   LogicalTableScan(table=[[t]])
> {code}
> The reason being that {{Aggregate}} represents its {{groupSet}} as a 
> {{BitSet}} of columns which are implicitly always ordered from lowest to 
> highest _index_.  Or, in other words, the line {{relBuilder.groupKey(1, 0)}} 
> always creates the same grouping key as {{relBuilder.groupKey(0, 1)}}.
> It should be documented somewhere that Aggregates expect their incoming 
> grouping columns to be sorted in order of lowest to highest index in the 
> incoming schema.  Possible candidates for places for this documentation 
> include:
> * On {{RelBuilder#groupKey()}}
> * On {{RelBuilder#aggregate()}}
> * On {{Aggregate}} itself
> This is not just a hypothetical problem - the substrait-java project in fact 
> does not properly handle converting aggregates whose grouping key order is 
> not already sorted from lowest to highest index - See 
> [SubstraitRelNodeConverter#visit(Aggregate)|https://github.com/substrait-io/substrait-java/blob/main/isthmus/src/main/java/io/substrait/isthmus/SubstraitRelNodeConverter.java#L208].
>   Of course, this is a separate bug with substrait-java, but perhaps it could 
> have been avoided (and future such problems could still be avoided) with 
> slightly more documentation.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (CALCITE-5801) Document that Aggregate grouping keys are emitted from lowest to highest index

Reply via email to