Will Noble created CALCITE-5775:
-----------------------------------

             Summary: Null direction emulation broken for complex expressions 
on some dialects
                 Key: CALCITE-5775
                 URL: https://issues.apache.org/jira/browse/CALCITE-5775
             Project: Calcite
          Issue Type: Bug
          Components: core
            Reporter: Will Noble


This is a problem in BigQuery, and may be a problem in other dialects as well. 
Consider the following piece of valid BQ SQL:

{code:sql}
SELECT REPEAT(first_name, 2),
       COUNT(id)
FROM looker_test.users
GROUP BY REPEAT(first_name, 2)
ORDER BY 1
{code}

Now consider a version where the {{ORDER BY}} clause is changed to this:

{code:sql}
ORDER BY REPEAT(first_name, 2)
{code}

This is logically the same query, because the expression in the {{ORDER}} 
clause is the same as the one in the {{SELECT}} / {{GROUP}} clauses. BigQuery 
is sophisticated enough to match the select to the group expression in both 
queries, but cannot match either with the order expression. It gives this 
error: _ORDER BY clause expression references column first_name which is 
neither grouped nor aggregated_.

So, when sorting by complex expressions in BQ, Calcite relies on either:
* No null direction emulation required. 
* sorting by alias or ordinal, which is a problem with current null direction 
emulation because it adds an extra complex sort expression, or
* having a query that just happens to also have the underlying field in the 
{{GROUP BY}} clause by itself, which seems to actually happen pretty often in 
my testing, but obviously shouldn't be a constraint.

As I wrote that, I realized this may be easily fixable for BQ since it added 
support for {{NULLS FIRST}} / {{LAST}} in 2020 and it seems Calcite has not 
caught up yet. Consider this rel node:

{code}
LogicalSort(sort0=[$0], dir0=[ASC])
  LogicalAggregate(group=[{0}], cent=[COUNT($1)])
    LogicalProject($f0=[CASE(IS NULL($4), 0, 1)], MGR=[$3])
      JdbcTableScan(table=[[JDBC_SCOTT, EMP]])
{code}

Calcite would convert it to this in BigQuery due to null direction emulation:

{code:sql}
SELECT CASE WHEN HIREDATE IS NULL THEN 0 ELSE 1 END AS `$f0`, COUNT(MGR) AS cent
FROM SCOTT.EMP
GROUP BY CASE WHEN HIREDATE IS NULL THEN 0 ELSE 1 END
ORDER BY CASE WHEN HIREDATE IS NULL THEN 0 ELSE 1 END IS NULL, 1
{code}

Which of course triggers the problem described above. This may be a problem for 
MSSQL as well since it doesn't support {{NULLS LAST}}. The fix for BQ, at 
least, may be to just support {{NULLS LAST}} and sort by ordinal.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to