Pinhan Zhao created CALCITE-5578:
------------------------------------

             Summary: RelOptRulesTest testAggregateCaseToFilter optimized plan 
not semantically equivalent to the original one after conversion
                 Key: CALCITE-5578
                 URL: https://issues.apache.org/jira/browse/CALCITE-5578
             Project: Calcite
          Issue Type: Bug
          Components: core
            Reporter: Pinhan Zhao


After converting the "plan before" and "plan after" of 
testAggregateCaseToFilter 
([https://github.com/apache/calcite/blob/2dba40e7a0a5651eac5a30d9e0a72f178bd9bff2/core/src/test/java/org/apache/calcite/test/RelOptRulesTest.java#L4140])
 to PostgreSQL queries, we obtained two queries:
{code:java}
SELECT SUM(SAL) AS SUM_SAL, COUNT(DISTINCT CASE WHEN JOB = 'CLERK' THEN DEPTNO 
ELSE NULL END) AS COUNT_DISTINCT_CLERK, SUM(CASE WHEN DEPTNO = 10 THEN SAL ELSE 
NULL END) AS SUM_SAL_D10, SUM(CASE WHEN DEPTNO = 20 THEN SAL ELSE 0 END) AS 
SUM_SAL_D20, SUM(CASE WHEN DEPTNO = 30 THEN 1 ELSE 0 END) AS COUNT_D30, 
COUNT(CASE WHEN DEPTNO = 40 THEN 'x' ELSE NULL END) AS COUNT_D40, SUM(CASE WHEN 
DEPTNO = 45 THEN 1 ELSE NULL END) AS COUNT_D45, SUM(CASE WHEN DEPTNO = 50 THEN 
1 ELSE NULL END) AS COUNT_D50, SUM(CAST(NULL AS DECIMAL(19, 9))) AS 
SUM_NULL_D60, SUM(CASE WHEN DEPTNO = 70 THEN NULL ELSE 1 END) AS SUM_NULL_D70, 
COUNT(CASE WHEN DEPTNO = 20 THEN 1 ELSE NULL END) AS COUNT_D20 FROM EMP{code}
and
{code:java}
SELECT SUM(SAL) AS SUM_SAL, COUNT(DISTINCT DEPTNO) FILTER (WHERE JOB = 'CLERK') 
AS COUNT_DISTINCT_CLERK, SUM(SAL) FILTER (WHERE DEPTNO = 10) AS SUM_SAL_D10, 
SUM(SAL) FILTER (WHERE DEPTNO = 20) AS SUM_SAL_D20, CAST(COUNT(*) FILTER (WHERE 
DEPTNO = 30) AS INTEGER) AS COUNT_D30, COUNT(*) FILTER (WHERE DEPTNO = 40) AS 
COUNT_D40, SUM(1) FILTER (WHERE DEPTNO = 45) AS COUNT_D45, SUM(1) FILTER (WHERE 
DEPTNO = 50) AS COUNT_D50, SUM(CAST(NULL AS DECIMAL(19, 9))) AS SUM_NULL_D60, 
SUM(1) FILTER (WHERE DEPTNO <> 70) AS SUM_NULL_D70, COUNT(*) FILTER (WHERE 
DEPTNO = 20) AS COUNT_D20 FROM EMP{code}
 

The code we used to convert the plans to PostgreSQL queries is:
{code:java}
converter.visitRoot(rel).asStatement().toSqlString(PostgresqlSqlDialect.DEFAULT).getSql().replace('\n',
 '');{code}
 

However, these queries are not semantically equivalent.  With a counterexample 
below (which I believe is also satisfying the integrity constraints):
{code:java}
CREATE TABLE EMP (
        EMPNO INTEGER,
        DEPTNO INTEGER,
        ENAME VARCHAR(20),
        JOB VARCHAR(20),
        MGR INTEGER,
        HIREDATE DATE,
        SAL INTEGER,
        COMM INTEGER,
        SLACKER INTEGER
);
INSERT INTO EMP VALUES (0, 70, '-2147483649', '-6721455509335307966', 0, 
'1970-01-01', 0, 0, 1);

CREATE TABLE DEPT (
        DEPTNO INTEGER,
        NAME VARCHAR(20)
);
INSERT INTO DEPT VALUES (70, '-2147483649');

CREATE TABLE BONUS (
        ENAME VARCHAR(20),
        JOB VARCHAR(20),
        SAL INTEGER,
        COMM INTEGER
);
INSERT INTO BONUS VALUES ('2147483648', '2147483648', 0, 0);

CREATE TABLE EMPNULLABLES (
        EMPNO INTEGER,
        DEPTNO INTEGER,
        ENAME VARCHAR(20),
        JOB VARCHAR(20),
        MGR INTEGER,
        HIREDATE DATE,
        SAL INTEGER,
        COMM INTEGER,
        SLACKER INTEGER
);
INSERT INTO EMPNULLABLES VALUES (0, 70, '2147483648', '2147483648', 0, 
'1970-01-01', 0, 0, 1);

CREATE TABLE EMPNULLABLES_20 (
        EMPNO INTEGER,
        DEPTNO INTEGER,
        ENAME VARCHAR(20),
        JOB VARCHAR(20),
        MGR INTEGER,
        HIREDATE DATE,
        SAL INTEGER,
        COMM INTEGER,
        SLACKER INTEGER
);
INSERT INTO EMPNULLABLES_20 VALUES (0, 70, '2147483648', '2147483648', 0, 
'1970-01-01', 0, 0, 1);

CREATE TABLE EMP_B (
        EMPNO INTEGER,
        DEPTNO INTEGER,
        ENAME VARCHAR(20),
        JOB VARCHAR(20),
        MGR INTEGER,
        HIREDATE DATE,
        SAL INTEGER,
        COMM INTEGER,
        SLACKER INTEGER,
        BIRTHDATE DATE
);
INSERT INTO EMP_B VALUES (0, 70, '2147483648', '2147483648', 0, '1970-01-01', 
0, 0, 1, '1970-01-01');{code}
 

we can have two queries produce different output:
{code:java}
 sum_sal | count_distinct_clerk | sum_sal_d10 | sum_sal_d20 | count_d30 | 
count_d40 | count_d45 | count_d50 | sum_null_d60 | sum_null_d70 | count_d20 
---------+----------------------+-------------+-------------+-----------+-----------+-----------+-----------+--------------+--------------+-----------
       0 |                    0 |             |           0 |         0 |       
  0 |           |           |              |              |         0
(1 row){code}
vs
{code:java}
 sum_sal | count_distinct_clerk | sum_sal_d10 | sum_sal_d20 | count_d30 | 
count_d40 | count_d45 | count_d50 | sum_null_d60 | sum_null_d70 | count_d20 
---------+----------------------+-------------+-------------+-----------+-----------+-----------+-----------+--------------+--------------+-----------
       0 |                    0 |             |             |         0 |       
  0 |           |           |              |              |         0
(1 row){code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to