Pinhan Zhao created CALCITE-5578:
------------------------------------
Summary: RelOptRulesTest testAggregateCaseToFilter optimized plan
not semantically equivalent to the original one after conversion
Key: CALCITE-5578
URL: https://issues.apache.org/jira/browse/CALCITE-5578
Project: Calcite
Issue Type: Bug
Components: core
Reporter: Pinhan Zhao
After converting the "plan before" and "plan after" of
testAggregateCaseToFilter
([https://github.com/apache/calcite/blob/2dba40e7a0a5651eac5a30d9e0a72f178bd9bff2/core/src/test/java/org/apache/calcite/test/RelOptRulesTest.java#L4140])
to PostgreSQL queries, we obtained two queries:
{code:java}
SELECT SUM(SAL) AS SUM_SAL, COUNT(DISTINCT CASE WHEN JOB = 'CLERK' THEN DEPTNO
ELSE NULL END) AS COUNT_DISTINCT_CLERK, SUM(CASE WHEN DEPTNO = 10 THEN SAL ELSE
NULL END) AS SUM_SAL_D10, SUM(CASE WHEN DEPTNO = 20 THEN SAL ELSE 0 END) AS
SUM_SAL_D20, SUM(CASE WHEN DEPTNO = 30 THEN 1 ELSE 0 END) AS COUNT_D30,
COUNT(CASE WHEN DEPTNO = 40 THEN 'x' ELSE NULL END) AS COUNT_D40, SUM(CASE WHEN
DEPTNO = 45 THEN 1 ELSE NULL END) AS COUNT_D45, SUM(CASE WHEN DEPTNO = 50 THEN
1 ELSE NULL END) AS COUNT_D50, SUM(CAST(NULL AS DECIMAL(19, 9))) AS
SUM_NULL_D60, SUM(CASE WHEN DEPTNO = 70 THEN NULL ELSE 1 END) AS SUM_NULL_D70,
COUNT(CASE WHEN DEPTNO = 20 THEN 1 ELSE NULL END) AS COUNT_D20 FROM EMP{code}
and
{code:java}
SELECT SUM(SAL) AS SUM_SAL, COUNT(DISTINCT DEPTNO) FILTER (WHERE JOB = 'CLERK')
AS COUNT_DISTINCT_CLERK, SUM(SAL) FILTER (WHERE DEPTNO = 10) AS SUM_SAL_D10,
SUM(SAL) FILTER (WHERE DEPTNO = 20) AS SUM_SAL_D20, CAST(COUNT(*) FILTER (WHERE
DEPTNO = 30) AS INTEGER) AS COUNT_D30, COUNT(*) FILTER (WHERE DEPTNO = 40) AS
COUNT_D40, SUM(1) FILTER (WHERE DEPTNO = 45) AS COUNT_D45, SUM(1) FILTER (WHERE
DEPTNO = 50) AS COUNT_D50, SUM(CAST(NULL AS DECIMAL(19, 9))) AS SUM_NULL_D60,
SUM(1) FILTER (WHERE DEPTNO <> 70) AS SUM_NULL_D70, COUNT(*) FILTER (WHERE
DEPTNO = 20) AS COUNT_D20 FROM EMP{code}
The code we used to convert the plans to PostgreSQL queries is:
{code:java}
converter.visitRoot(rel).asStatement().toSqlString(PostgresqlSqlDialect.DEFAULT).getSql().replace('\n',
'');{code}
However, these queries are not semantically equivalent. With a counterexample
below (which I believe is also satisfying the integrity constraints):
{code:java}
CREATE TABLE EMP (
EMPNO INTEGER,
DEPTNO INTEGER,
ENAME VARCHAR(20),
JOB VARCHAR(20),
MGR INTEGER,
HIREDATE DATE,
SAL INTEGER,
COMM INTEGER,
SLACKER INTEGER
);
INSERT INTO EMP VALUES (0, 70, '-2147483649', '-6721455509335307966', 0,
'1970-01-01', 0, 0, 1);
CREATE TABLE DEPT (
DEPTNO INTEGER,
NAME VARCHAR(20)
);
INSERT INTO DEPT VALUES (70, '-2147483649');
CREATE TABLE BONUS (
ENAME VARCHAR(20),
JOB VARCHAR(20),
SAL INTEGER,
COMM INTEGER
);
INSERT INTO BONUS VALUES ('2147483648', '2147483648', 0, 0);
CREATE TABLE EMPNULLABLES (
EMPNO INTEGER,
DEPTNO INTEGER,
ENAME VARCHAR(20),
JOB VARCHAR(20),
MGR INTEGER,
HIREDATE DATE,
SAL INTEGER,
COMM INTEGER,
SLACKER INTEGER
);
INSERT INTO EMPNULLABLES VALUES (0, 70, '2147483648', '2147483648', 0,
'1970-01-01', 0, 0, 1);
CREATE TABLE EMPNULLABLES_20 (
EMPNO INTEGER,
DEPTNO INTEGER,
ENAME VARCHAR(20),
JOB VARCHAR(20),
MGR INTEGER,
HIREDATE DATE,
SAL INTEGER,
COMM INTEGER,
SLACKER INTEGER
);
INSERT INTO EMPNULLABLES_20 VALUES (0, 70, '2147483648', '2147483648', 0,
'1970-01-01', 0, 0, 1);
CREATE TABLE EMP_B (
EMPNO INTEGER,
DEPTNO INTEGER,
ENAME VARCHAR(20),
JOB VARCHAR(20),
MGR INTEGER,
HIREDATE DATE,
SAL INTEGER,
COMM INTEGER,
SLACKER INTEGER,
BIRTHDATE DATE
);
INSERT INTO EMP_B VALUES (0, 70, '2147483648', '2147483648', 0, '1970-01-01',
0, 0, 1, '1970-01-01');{code}
we can have two queries produce different output:
{code:java}
sum_sal | count_distinct_clerk | sum_sal_d10 | sum_sal_d20 | count_d30 |
count_d40 | count_d45 | count_d50 | sum_null_d60 | sum_null_d70 | count_d20
---------+----------------------+-------------+-------------+-----------+-----------+-----------+-----------+--------------+--------------+-----------
0 | 0 | | 0 | 0 |
0 | | | | | 0
(1 row){code}
vs
{code:java}
sum_sal | count_distinct_clerk | sum_sal_d10 | sum_sal_d20 | count_d30 |
count_d40 | count_d45 | count_d50 | sum_null_d60 | sum_null_d70 | count_d20
---------+----------------------+-------------+-------------+-----------+-----------+-----------+-----------+--------------+--------------+-----------
0 | 0 | | | 0 |
0 | | | | | 0
(1 row){code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)