Will Noble created CALCITE-5775:
-----------------------------------
Summary: Null direction emulation broken for complex expressions
on some dialects
Key: CALCITE-5775
URL: https://issues.apache.org/jira/browse/CALCITE-5775
Project: Calcite
Issue Type: Bug
Components: core
Reporter: Will Noble
This is a problem in BigQuery, and may be a problem in other dialects as well.
Consider the following piece of valid BQ SQL:
{code:sql}
SELECT REPEAT(first_name, 2),
COUNT(id)
FROM looker_test.users
GROUP BY REPEAT(first_name, 2)
ORDER BY 1
{code}
Now consider a version where the {{ORDER BY}} clause is changed to this:
{code:sql}
ORDER BY REPEAT(first_name, 2)
{code}
This is logically the same query, because the expression in the {{ORDER}}
clause is the same as the one in the {{SELECT}} / {{GROUP}} clauses. BigQuery
is sophisticated enough to match the select to the group expression in both
queries, but cannot match either with the order expression. It gives this
error: _ORDER BY clause expression references column first_name which is
neither grouped nor aggregated_.
So, when sorting by complex expressions in BQ, Calcite relies on either:
* No null direction emulation required.
* sorting by alias or ordinal, which is a problem with current null direction
emulation because it adds an extra complex sort expression, or
* having a query that just happens to also have the underlying field in the
{{GROUP BY}} clause by itself, which seems to actually happen pretty often in
my testing, but obviously shouldn't be a constraint.
As I wrote that, I realized this may be easily fixable for BQ since it added
support for {{NULLS FIRST}} / {{LAST}} in 2020 and it seems Calcite has not
caught up yet. Consider this rel node:
{code}
LogicalSort(sort0=[$0], dir0=[ASC])
LogicalAggregate(group=[{0}], cent=[COUNT($1)])
LogicalProject($f0=[CASE(IS NULL($4), 0, 1)], MGR=[$3])
JdbcTableScan(table=[[JDBC_SCOTT, EMP]])
{code}
Calcite would convert it to this in BigQuery due to null direction emulation:
{code:sql}
SELECT CASE WHEN HIREDATE IS NULL THEN 0 ELSE 1 END AS `$f0`, COUNT(MGR) AS cent
FROM SCOTT.EMP
GROUP BY CASE WHEN HIREDATE IS NULL THEN 0 ELSE 1 END
ORDER BY CASE WHEN HIREDATE IS NULL THEN 0 ELSE 1 END IS NULL, 1
{code}
Which of course triggers the problem described above. This may be a problem for
MSSQL as well since it doesn't support {{NULLS LAST}}. The fix for BQ, at
least, may be to just support {{NULLS LAST}} and sort by ordinal.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)