[ 
https://issues.apache.org/jira/browse/CALCITE-5775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17732239#comment-17732239
 ] 

Julian Hyde commented on CALCITE-5775:
--------------------------------------

Definitely make Calcite aware that BigQuery supports {{NULLS LAST}}, {{NULLS 
FIRST}}. That will save on a lot of complications.

Also make Calcite inclined to generate "GROUP BY 1" and "ORDER BY 1" for 
BigQuery when the expression occurs in the SELECT clause. BigQuery's 
recognition of common expressions isn't great.

In the previous bug (TODO link) you made some remarks about default null 
direction. FWIW, I think those concerns are red herrings. A user sending a 
query to Calcite can always add {{NULLS LAST}} or {{NULLS FIRST}} to achieve 
any desired effect, regardless of what Calcite's default null direction is. And 
by the way, you should probably try both directions in your tests.

> Null direction emulation broken for complex expressions on some dialects
> ------------------------------------------------------------------------
>
>                 Key: CALCITE-5775
>                 URL: https://issues.apache.org/jira/browse/CALCITE-5775
>             Project: Calcite
>          Issue Type: Bug
>          Components: core
>            Reporter: Will Noble
>            Priority: Minor
>
> This is a problem in BigQuery, and may be a problem in other dialects as 
> well. Consider the following piece of valid BQ SQL:
> {code:sql}
> SELECT REPEAT(first_name, 2),
>        COUNT(id)
> FROM looker_test.users
> GROUP BY REPEAT(first_name, 2)
> ORDER BY 1
> {code}
> Now consider a version where the {{ORDER BY}} clause is changed to this:
> {code:sql}
> ORDER BY REPEAT(first_name, 2)
> {code}
> This is logically the same query, because the expression in the {{ORDER}} 
> clause is the same as the one in the {{SELECT}} / {{GROUP}} clauses. BigQuery 
> is sophisticated enough to match the select to the group expression in both 
> queries, but cannot match either with the order expression. It gives this 
> error: _ORDER BY clause expression references column first_name which is 
> neither grouped nor aggregated_.
> So, when sorting by complex expressions in BQ, Calcite relies on either:
> * No null direction emulation required. 
> * sorting by alias or ordinal, which is a problem with current null direction 
> emulation because it adds an extra complex sort expression, or
> * having a query that just happens to also have the underlying field in the 
> {{GROUP BY}} clause by itself, which seems to actually happen pretty often in 
> my testing, but obviously shouldn't be a constraint.
> As I wrote that, I realized this may be easily fixable for BQ since it added 
> support for {{NULLS FIRST}} / {{LAST}} in 2020 and it seems Calcite has not 
> caught up yet. Consider this rel node:
> {code}
> LogicalSort(sort0=[$0], dir0=[ASC])
>   LogicalAggregate(group=[{0}], cent=[COUNT($1)])
>     LogicalProject($f0=[CASE(IS NULL($4), 0, 1)], MGR=[$3])
>       JdbcTableScan(table=[[JDBC_SCOTT, EMP]])
> {code}
> Calcite would convert it to this in BigQuery due to null direction emulation:
> {code:sql}
> SELECT CASE WHEN HIREDATE IS NULL THEN 0 ELSE 1 END AS `$f0`, COUNT(MGR) AS 
> cent
> FROM SCOTT.EMP
> GROUP BY CASE WHEN HIREDATE IS NULL THEN 0 ELSE 1 END
> ORDER BY CASE WHEN HIREDATE IS NULL THEN 0 ELSE 1 END IS NULL, 1
> {code}
> Which of course triggers the problem described above. This may be a problem 
> for MSSQL as well since it doesn't support {{NULLS LAST}}. The fix for BQ, at 
> least, may be to just support {{NULLS LAST}} and sort by ordinal.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to