[
https://issues.apache.org/jira/browse/CALCITE-4512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17886051#comment-17886051
]
yanjing.wang commented on CALCITE-4512:
---------------------------------------
Hello [~julianhyde] , as discussed before, I run 4 tests in different databases
and get a result.
we defer the issue of "whether to prefer table column or alias when GROUP BY an
identifier matching a table column and alias" to CALCITE-5043 here.
Instead, we focus on the problem "group by function that has an argument
matching an alias and a table column, the argument is substituted by the select
expression that the alias represents, causing group by function(function)
raised".
For example, execute the following test case in calcite lenient conformance:
{code:java}
select floor(empno/2) as empno from emp group by floor(empno/2){code}
current calcite behavior when validation
{code:java}
select floor(empno/2) as empno from emp group by floor(floor(empno/2)/2){code}
and throws
{code:java}
Expression 'EMP.EMPNO' is not being grouped {code}
I also make some tests on other databases.
prepare emp table:
{code:java}
create table emp (
empno int,
ename varchar(128),
job varchar(128),
mgr int,
sal int,
deptno int);
Insert into emp values
(1,'wyj','engineer',1,90,100),
(2,'wb','manager',2,120,100),
(3,'zf','engineer',1,100,100),
(4,'fa','engineer',1,110,110),
(5,'yx','engineer',1,70,100); {code}
{*}SQL1{*}:
{code:java}
select floor(empno/2) as empno from emp group by floor(empno/2){code}
Result1:
||empno||
|1|
|0|
|2|
Result2:
throws exception. such as
{code:java}
Expression 'EMP.EMPNO' is not being grouped {code}
{*}SQL2{*}:
{code:java}
SELECT mod(empno, 3) as empno FROM emp GROUP BY mod(empno, 2){code}
Result1:
throws exception.
{code:java}
Expression 'EMPNO' is not being grouped {code}
{*}SQL3{*}:
{code:java}
select length(ename) as ename, count(1) as total from emp group by GROUPING
SETS (1, (), ename, length(ename)){code}
Result1:
||ename||total||
|3|1|
|2|4|
|NULL|5|
|3|1|
|2|4|
|3|1|
|2|4|
ename in length(ename) isn't substituted .
1, ename has been substituted by length(ename)
BigQuery has the result.
Result2:
||ename ||total||
|NULL|5|
|NULL|5|
|3|1|
|3|1|
|2|4|
|2|4|
the redundant [NULL 5] row is incorrect because group by length(ename) has been
substituted by group by length(length(ename)) in calcite.
Result3:
||name||total||
|2|4|
|3|1|
|NULL|5|
|NULL|5|
|NULL|1|
|NULL|1|
|NULL|1|
|NULL|1|
|NULL|1|
ename in length(ename) isn't substituted.
group by 1, ename isn't substituted by length(ename)
Oracle has the result.
Result4:
||name||total||
|NULL|5|
|3|1|
|2|4|
|3|1|
|2|4|
|NULL|1|
|NULL|1|
|NULL|1|
|NULL|1|
|NULL|1|
ename in length(ename) isn't substituted.
group by ename isn't substituted by length(ename).
PG and Oracle with group_by_position_enabled have the result.
Result5:
throws, Each GROUP BY expression must contain at least one column that is not
an outer reference. MSSQL/Presto doesn't support group by 1.
Presto has the result.
Result6:
||name||total||
|NULL|5|
|NULL|1|
|NULL|4|
|3|1|
|2|4|
|NULL|1|
|NULL|1|
|NULL|1|
|NULL|1|
|NULL|1|
ename in length(ename) isn't substituted.
spark miss the name for last length(ename).
group by ename isn't substituted by length(ename).
Result7:
Throws, argument type error of function length.
ClickHouse has the result.
{*}SQL4{*}:
{code:java}
select length(ename) as ename, count(1) as total from emp group by
cube(length(ename));{code}
Result1:
||name||total||
|NULL|5|
|3|1|
|2|4|
Result2:
throws, because group by cube(length(ename)) is substituted by group by
cube(length(length(ename))), the exception may be Expression 'ENAME' is not
being grouped or argument type error of function length.
Result3:
GROUP BY expression must be a column reference: "length"(ename)
various databases' behaviors:
||test
case/db||Calcite||BigQuery||MySQL||Oracle||PostgresSQL||MSSQL||ClickHouse||Hive||Spark||Presto||
|SQL1|{color:#FF0000}Result2{color}{*}{*}|Result1{*}{*}|Result1|Result1|Result1|Result1|{color:#FF0000}Result2{color}|Result1|Result1|Result1|
|SQL2|Result1|Result1|Result1|Result1|Result1|Result1|Result1|Result1|Result1|Result1|
|SQL3|{color:#FF0000}Result2{color}|Result1|Not
Support|Result3|Result4|Result5|{color:#FF0000}Result7{color}|Not
Support|Result6|Result5|
|SQL4|{color:#FF0000}Result2{color}|Result1|Not
Support|Result1|Result1|Result1|{color:#FF0000}Result2{color}|Result1|Result1|Result3|
We can conclude the following:
# The alias in function within GROUP BY clause is substituted in Calcite and
ClickHouse database only.
# length(ename)'s ename in group by GROUPING SETS (1, (), ename,
length(ename)) is substituted in Calcite and ClickHouse database only. other
database's behaviors differ in 1 and ename processing, it will be solved in
CALCITE-5043 .
# Oracle doesn't support GROUP BY ordinal by default, if we set
group_by_position_enabled = true; SQL3 result will be Result4.
How to solve this problem?
I think we can do it according to your hypothesis that the alias will be
substituted only if the expression in GROUP BY , GROUPING SETS, CUBE, ROLLUP is
an identifier or ordinal if the conformance supports group by alias or ordinal.
Do you agree?
> GROUP BY expression with argument name same with SELECT field and alias
> causes validation error
> -----------------------------------------------------------------------------------------------
>
> Key: CALCITE-4512
> URL: https://issues.apache.org/jira/browse/CALCITE-4512
> Project: Calcite
> Issue Type: Bug
> Components: core
> Affects Versions: 1.38.0
> Environment: jvm: open-jdk11
> Reporter: yanjing.wang
> Assignee: yanjing.wang
> Priority: Major
> Labels: pull-request-available
> Fix For: 1.38.0
>
> Time Spent: 3h
> Remaining Estimate: 0h
>
> {code:java}
> String sql = "select replace(name, 'a', 'b') as name from users group by
> replace(name, 'a', 'b')";
> {code}
>
> when group by expression has argument name same with select list item alias,
> the group by argument will be expanded to select list item when sql
> conformance supports 'groupByAlias', so the above sql will be expanded to
> {code:java}
> String sql = "select replace(name, 'a', 'b') as name from users group by
> replace(replace(name, 'a', 'b'), 'a', 'b')";
> {code}
>
> this is unexpected.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)