[jira] [Commented] (CALCITE-4512) GROUP BY expression with argument name same with SELECT field and alias causes validation error

yanjing.wang (Jira) Mon, 30 Sep 2024 17:56:17 -0700


    [ 
https://issues.apache.org/jira/browse/CALCITE-4512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17886051#comment-17886051
 ]


yanjing.wang commented on CALCITE-4512:
---------------------------------------

Hello [~julianhyde] , as discussed before, I run 4 tests in different databases 
and get a result.

we defer the issue of "whether to prefer table column or alias when GROUP BY an 
identifier matching a table column and alias" to  CALCITE-5043 here.

Instead, we focus on the problem "group by function that has an argument 
matching an alias and a table column, the argument is substituted by the select 
expression that the alias represents, causing group by function(function) 
raised". 

For example, execute the following test case in calcite lenient conformance: 

 
{code:java}
select floor(empno/2) as empno from emp group by floor(empno/2){code}
 

current calcite behavior when validation

 
{code:java}
select floor(empno/2) as empno from emp group by floor(floor(empno/2)/2){code}
 

and throws

 
{code:java}
Expression 'EMP.EMPNO' is not being grouped  {code}
 

 

I also make some tests on other databases.

prepare emp table:

 
{code:java}
create table emp (
empno int, 
ename varchar(128), 
job varchar(128), 
mgr int, 
sal int, 
deptno int);

Insert into emp values 
(1,'wyj','engineer',1,90,100),
(2,'wb','manager',2,120,100),
(3,'zf','engineer',1,100,100),
(4,'fa','engineer',1,110,110),
(5,'yx','engineer',1,70,100); {code}
 

 

{*}SQL1{*}:

 
{code:java}
select floor(empno/2) as empno from emp group by floor(empno/2){code}
 

Result1:

 
||empno||
|1|
|0|
|2|

Result2:

throws exception. such as

 
{code:java}
Expression 'EMP.EMPNO' is not being grouped {code}
 

 

{*}SQL2{*}:
{code:java}
SELECT mod(empno, 3) as empno FROM emp GROUP BY mod(empno, 2){code}
Result1:

throws exception.

 
{code:java}
Expression 'EMPNO' is not being grouped {code}
 

 

{*}SQL3{*}:
{code:java}
select length(ename) as ename, count(1) as total from emp group by GROUPING 
SETS (1, (), ename, length(ename)){code}
 

Result1:

 
||ename||total||
|3|1|
|2|4|
|NULL|5|
|3|1|
|2|4|
|3|1|
|2|4|

ename in length(ename) isn't substituted .

1, ename has been substituted by length(ename)

BigQuery has the result.

 

 

Result2:

 
||ename ||total||
|NULL|5|
|NULL|5|
|3|1|
|3|1|
|2|4|
|2|4|

 

the redundant [NULL 5] row is incorrect because group by length(ename) has been 
substituted by group by length(length(ename)) in calcite.

 

Result3:

 
||name||total||
|2|4|
|3|1|
|NULL|5|
|NULL|5|
|NULL|1|
|NULL|1|
|NULL|1|
|NULL|1|
|NULL|1|

 
 
 

ename in length(ename) isn't substituted.

group by 1, ename isn't substituted by length(ename)

Oracle has the result.

 

Result4:

 
||name||total||
|NULL|5|
|3|1|
|2|4|
|3|1|
|2|4|
|NULL|1|
|NULL|1|
|NULL|1|
|NULL|1|
|NULL|1|

 

ename in length(ename) isn't substituted.

group by ename isn't substituted by length(ename).

PG and Oracle with group_by_position_enabled have the result.

 

Result5:

throws, Each GROUP BY expression must contain at least one column that is not 
an outer reference. MSSQL/Presto doesn't support group by 1. 

Presto has the result.

 

Result6:

 
||name||total||
|NULL|5|
|NULL|1|
|NULL|4|
|3|1|
|2|4|
|NULL|1|
|NULL|1|
|NULL|1|
|NULL|1|
|NULL|1|

ename in length(ename) isn't substituted.

spark miss the name for last length(ename).

group by ename isn't substituted by length(ename).

 

Result7:

Throws, argument type error of function length.

ClickHouse has the result.

 

{*}SQL4{*}:
{code:java}
select length(ename) as ename, count(1) as total from emp group by 
cube(length(ename));{code}
Result1:
||name||total||
|NULL|5|
|3|1|
|2|4|

Result2:
 
throws, because group by cube(length(ename)) is substituted by group by 
cube(length(length(ename))), the exception may be Expression 'ENAME' is not 
being grouped or argument type error of function length.

 

Result3:

GROUP BY expression must be a column reference: "length"(ename) 

 

various databases' behaviors:

 
||test 
case/db||Calcite||BigQuery||MySQL||Oracle||PostgresSQL||MSSQL||ClickHouse||Hive||Spark||Presto||
|SQL1|{color:#FF0000}Result2{color}{*}{*}|Result1{*}{*}|Result1|Result1|Result1|Result1|{color:#FF0000}Result2{color}|Result1|Result1|Result1|
|SQL2|Result1|Result1|Result1|Result1|Result1|Result1|Result1|Result1|Result1|Result1|
|SQL3|{color:#FF0000}Result2{color}|Result1|Not 
Support|Result3|Result4|Result5|{color:#FF0000}Result7{color}|Not 
Support|Result6|Result5|
|SQL4|{color:#FF0000}Result2{color}|Result1|Not 
Support|Result1|Result1|Result1|{color:#FF0000}Result2{color}|Result1|Result1|Result3|

 

We can conclude the following:
 # The alias in function within GROUP BY clause is substituted in Calcite and 
ClickHouse database only.
 #  length(ename)'s ename in group by GROUPING SETS (1, (), ename, 
length(ename)) is substituted in Calcite and ClickHouse database only. other 
database's behaviors differ in 1 and ename processing, it will be solved in  
CALCITE-5043 .
 # Oracle doesn't support GROUP BY ordinal by default, if we set 
group_by_position_enabled = true; SQL3 result will be  Result4.

 

How to solve this problem?

I think we can do it according to your hypothesis that the alias will be 
substituted only if the expression in GROUP BY , GROUPING SETS, CUBE, ROLLUP is 
an identifier or ordinal if the conformance supports group by alias or ordinal.

 

Do you agree?

 

 

> GROUP BY expression with argument name same with SELECT field and alias 
> causes validation error
> -----------------------------------------------------------------------------------------------
>
>                 Key: CALCITE-4512
>                 URL: https://issues.apache.org/jira/browse/CALCITE-4512
>             Project: Calcite
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 1.38.0
>         Environment: jvm: open-jdk11
>            Reporter: yanjing.wang
>            Assignee: yanjing.wang
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 1.38.0
>
>          Time Spent: 3h
>  Remaining Estimate: 0h
>
> {code:java}
> String sql = "select replace(name, 'a', 'b') as name from users group by 
> replace(name, 'a', 'b')";
> {code}
>  
> when group by expression has argument name same with select list item alias, 
> the group by argument will be expanded to select list item when sql 
> conformance supports 'groupByAlias', so the above sql will be expanded to 
> {code:java}
> String sql = "select replace(name, 'a', 'b') as name from users group by 
> replace(replace(name, 'a', 'b'), 'a', 'b')";
> {code}
>  
> this is unexpected.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (CALCITE-4512) GROUP BY expression with argument name same with SELECT field and alias causes validation error

Reply via email to