[
https://issues.apache.org/jira/browse/CALCITE-5801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17737410#comment-17737410
]
Julian Hyde commented on CALCITE-5801:
--------------------------------------
I agree that this is a rather subtle aspect of Calcite's behavior and would
benefit from better documentation.
(It is a necessary for Calcite to work this way. "\{{GROUP BY a, b}}",
"\{{GROUP BY b, a}}" and "\{{GROUP BY a, b, a}}" all have identical effect in
SQL. Calcite's representation of these should be, and is, identical.)
For a while I have been working on CALCITE-5802, a feature request to make it
easier to translate complex aggregate expressions such as "1 + SUM(x + 2)". It
is somewhat related to your problem, in that it deals with expressions that are
too complex to express in a single relational operation, and in particular
deals with the {{SELECT}} and {{GROUP BY}} clauses simultaneously.
> Document that Aggregate grouping keys are emitted from lowest to highest index
> ------------------------------------------------------------------------------
>
> Key: CALCITE-5801
> URL: https://issues.apache.org/jira/browse/CALCITE-5801
> Project: Calcite
> Issue Type: Improvement
> Components: core
> Affects Versions: 1.34.0
> Reporter: Chris Rice
> Priority: Minor
>
> Consider query:
> {code}
> SELECT b, a FROM t GROUP BY b, a
> {code}
> where the schema for table {{t}} emits {{a}} in column 0 and {{b}} in column
> 1.
> You might be tempted to build this relation using:
> {code}
> relBuilder.scan("t");
> var groupKey = relBuilder.groupKey(1, 0);
> var aggregate = relBuilder.aggregate(groupKey).build();
> {code}
> but this will emit a plan which emits `a, b` rather than `b, a`:
> {code}
> LogicalAggregate(group=[{0, 1}])
> LogicalTableScan(table=[[t]])
> {code}
> The reason being that {{Aggregate}} represents its {{groupSet}} as a
> {{BitSet}} of columns which are implicitly always ordered from lowest to
> highest _index_. Or, in other words, the line {{relBuilder.groupKey(1, 0)}}
> always creates the same grouping key as {{relBuilder.groupKey(0, 1)}}.
> It should be documented somewhere that Aggregates expect their incoming
> grouping columns to be sorted in order of lowest to highest index in the
> incoming schema. Possible candidates for places for this documentation
> include:
> * On {{RelBuilder#groupKey()}}
> * On {{RelBuilder#aggregate()}}
> * On {{Aggregate}} itself
> This is not just a hypothetical problem - the substrait-java project in fact
> does not properly handle converting aggregates whose grouping key order is
> not already sorted from lowest to highest index - See
> [SubstraitRelNodeConverter#visit(Aggregate)|https://github.com/substrait-io/substrait-java/blob/main/isthmus/src/main/java/io/substrait/isthmus/SubstraitRelNodeConverter.java#L208].
> Of course, this is a separate bug with substrait-java, but perhaps it could
> have been avoided (and future such problems could still be avoided) with
> slightly more documentation.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)