Will Noble created CALCITE-5808:
-----------------------------------
Summary: Rel-to-Sql conversion should better support grouping or
sorting by references or ordinals
Key: CALCITE-5808
URL: https://issues.apache.org/jira/browse/CALCITE-5808
Project: Calcite
Issue Type: Improvement
Reporter: Will Noble
While exploring solutions to CALCITE-5724 I produced [this draft
PR|https://github.com/apache/calcite/commit/d644ecee44ffd7927a62be96329cd5456f545de2]
to explore what the Rel-to-Sql conversion process would look like if it were
heavily biased toward using ordinals in the {{GROUP BY}} and {{ORDER BY}}
clauses. It works for most common queries, but I gave up with basically 2
problems yet to be solved:
# Grouping with {{ROLLUP}}, {{CUBE}}, or {{GROUPING SETS}} does not allow
ordinals (at least [according to this presto
ticket|https://github.com/prestodb/presto/issues/9522]; I didn't double-check
any standards or dialects), so my draft solution would have to distinguish
between these cases and a simple group-by, and only use ordinals in the simple
case.
# Window functions with an {{ORDER BY}} clause also use ordinals in my draft.
It seems likely that ordinals are also disallowed in this context, but I'm not
sure. Looking at the results in
{{RelToSqlConverterTest.testConvertWindowToSql()}}, it certainly looks like the
query is incorrect even if ordinals can hypothetically go in a window's {{ORDER
BY}}.
So, I've decided to stop working on this for now, but wanted to preserve my
draft change and start a discussion on this ticket. Here are some thoughts:
# It seems like we can reach a happy middle ground on sorting by using ordinals
whenever the dialect allows *and* the expression returned by {{field()}} is
anything other than a {{SqlIdentifier}} (i.e. a named column reference).
Existing behavior as of writing is to only use ordinals when the expression is
a {{SqlCall}}, but we should use it for literals as well. That would solve
CALCITE-5724, but still use named references for "simple" collations, which
seems to be the case for all window functions.
# When it comes to grouping, things are more complicated. As of writing,
Calcite tends to group by expressions, but some dialects (e.g. BigQuery) can
get easily confused by this, even when the expression in the {{SELECT}} list
perfectly matches that in the {{GROUP BY}} clause. With our Calcite
integration, we need customized cleanup logic to rewrite the {{GROUP BY}}
clauses in terms of aliases / ordinals in the {{SELECT}} list whenever
possible, as a band-aid for this problem. We want to get rid of this band-aid
and upstream a proper solution, and I thought using ordinals could be it, but
the problems with {{GROUPING SETS}} et al still needs to be solved, and Calcite
seems to have poor support for grouping by ordinals in general (see the changes
I had to make to {{RelToSqlConverter.generateGroupList}} in my draft) which
should be improved.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)