[
https://issues.apache.org/jira/browse/CALCITE-5578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
ASF GitHub Bot updated CALCITE-5578:
------------------------------------
Labels: pull-request-available (was: )
> RelOptRulesTest testAggregateCaseToFilter optimized plan not semantically
> equivalent to the original one after conversion
> -------------------------------------------------------------------------------------------------------------------------
>
> Key: CALCITE-5578
> URL: https://issues.apache.org/jira/browse/CALCITE-5578
> Project: Calcite
> Issue Type: Bug
> Components: core
> Reporter: Pinhan Zhao
> Priority: Blocker
> Labels: pull-request-available
>
> After converting the "plan before" and "plan after" of
> testAggregateCaseToFilter
> ([https://github.com/apache/calcite/blob/2dba40e7a0a5651eac5a30d9e0a72f178bd9bff2/core/src/test/java/org/apache/calcite/test/RelOptRulesTest.java#L4140])
> to PostgreSQL queries, we obtained two queries:
> {code:java}
> SELECT SUM(SAL) AS SUM_SAL, COUNT(DISTINCT CASE WHEN JOB = 'CLERK' THEN
> DEPTNO ELSE NULL END) AS COUNT_DISTINCT_CLERK, SUM(CASE WHEN DEPTNO = 10 THEN
> SAL ELSE NULL END) AS SUM_SAL_D10, SUM(CASE WHEN DEPTNO = 20 THEN SAL ELSE 0
> END) AS SUM_SAL_D20, SUM(CASE WHEN DEPTNO = 30 THEN 1 ELSE 0 END) AS
> COUNT_D30, COUNT(CASE WHEN DEPTNO = 40 THEN 'x' ELSE NULL END) AS COUNT_D40,
> SUM(CASE WHEN DEPTNO = 45 THEN 1 ELSE NULL END) AS COUNT_D45, SUM(CASE WHEN
> DEPTNO = 50 THEN 1 ELSE NULL END) AS COUNT_D50, SUM(CAST(NULL AS DECIMAL(19,
> 9))) AS SUM_NULL_D60, SUM(CASE WHEN DEPTNO = 70 THEN NULL ELSE 1 END) AS
> SUM_NULL_D70, COUNT(CASE WHEN DEPTNO = 20 THEN 1 ELSE NULL END) AS COUNT_D20
> FROM EMP{code}
> and
> {code:java}
> SELECT SUM(SAL) AS SUM_SAL, COUNT(DISTINCT DEPTNO) FILTER (WHERE JOB =
> 'CLERK') AS COUNT_DISTINCT_CLERK, SUM(SAL) FILTER (WHERE DEPTNO = 10) AS
> SUM_SAL_D10, SUM(SAL) FILTER (WHERE DEPTNO = 20) AS SUM_SAL_D20,
> CAST(COUNT(*) FILTER (WHERE DEPTNO = 30) AS INTEGER) AS COUNT_D30, COUNT(*)
> FILTER (WHERE DEPTNO = 40) AS COUNT_D40, SUM(1) FILTER (WHERE DEPTNO = 45) AS
> COUNT_D45, SUM(1) FILTER (WHERE DEPTNO = 50) AS COUNT_D50, SUM(CAST(NULL AS
> DECIMAL(19, 9))) AS SUM_NULL_D60, SUM(1) FILTER (WHERE DEPTNO <> 70) AS
> SUM_NULL_D70, COUNT(*) FILTER (WHERE DEPTNO = 20) AS COUNT_D20 FROM EMP{code}
>
> The code we used to convert the plans to PostgreSQL queries is:
> {code:java}
> converter.visitRoot(rel).asStatement().toSqlString(PostgresqlSqlDialect.DEFAULT).getSql().replace('\n',
> '');{code}
>
> However, these queries are not semantically equivalent. With a
> counterexample below (which I believe is also satisfying the integrity
> constraints):
> {code:java}
> CREATE TABLE EMP (
> EMPNO INTEGER,
> DEPTNO INTEGER,
> ENAME VARCHAR(20),
> JOB VARCHAR(20),
> MGR INTEGER,
> HIREDATE DATE,
> SAL INTEGER,
> COMM INTEGER,
> SLACKER INTEGER
> );
> INSERT INTO EMP VALUES (0, 70, '-2147483649', '-6721455509335307966', 0,
> '1970-01-01', 0, 0, 1);
> CREATE TABLE DEPT (
> DEPTNO INTEGER,
> NAME VARCHAR(20)
> );
> INSERT INTO DEPT VALUES (70, '-2147483649');
> CREATE TABLE BONUS (
> ENAME VARCHAR(20),
> JOB VARCHAR(20),
> SAL INTEGER,
> COMM INTEGER
> );
> INSERT INTO BONUS VALUES ('2147483648', '2147483648', 0, 0);
> CREATE TABLE EMPNULLABLES (
> EMPNO INTEGER,
> DEPTNO INTEGER,
> ENAME VARCHAR(20),
> JOB VARCHAR(20),
> MGR INTEGER,
> HIREDATE DATE,
> SAL INTEGER,
> COMM INTEGER,
> SLACKER INTEGER
> );
> INSERT INTO EMPNULLABLES VALUES (0, 70, '2147483648', '2147483648', 0,
> '1970-01-01', 0, 0, 1);
> CREATE TABLE EMPNULLABLES_20 (
> EMPNO INTEGER,
> DEPTNO INTEGER,
> ENAME VARCHAR(20),
> JOB VARCHAR(20),
> MGR INTEGER,
> HIREDATE DATE,
> SAL INTEGER,
> COMM INTEGER,
> SLACKER INTEGER
> );
> INSERT INTO EMPNULLABLES_20 VALUES (0, 70, '2147483648', '2147483648', 0,
> '1970-01-01', 0, 0, 1);
> CREATE TABLE EMP_B (
> EMPNO INTEGER,
> DEPTNO INTEGER,
> ENAME VARCHAR(20),
> JOB VARCHAR(20),
> MGR INTEGER,
> HIREDATE DATE,
> SAL INTEGER,
> COMM INTEGER,
> SLACKER INTEGER,
> BIRTHDATE DATE
> );
> INSERT INTO EMP_B VALUES (0, 70, '2147483648', '2147483648', 0, '1970-01-01',
> 0, 0, 1, '1970-01-01');{code}
>
> we can have two queries produce different output:
> {code:java}
> sum_sal | count_distinct_clerk | sum_sal_d10 | sum_sal_d20 | count_d30 |
> count_d40 | count_d45 | count_d50 | sum_null_d60 | sum_null_d70 | count_d20
> ---------+----------------------+-------------+-------------+-----------+-----------+-----------+-----------+--------------+--------------+-----------
> 0 | 0 | | 0 | 0 |
> 0 | | | | | 0
> (1 row){code}
> vs
> {code:java}
> sum_sal | count_distinct_clerk | sum_sal_d10 | sum_sal_d20 | count_d30 |
> count_d40 | count_d45 | count_d50 | sum_null_d60 | sum_null_d70 | count_d20
> ---------+----------------------+-------------+-------------+-----------+-----------+-----------+-----------+--------------+--------------+-----------
> 0 | 0 | | | 0 |
> 0 | | | | | 0
> (1 row){code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)