Ruilong Huo created HAWQ-830:
--------------------------------
Summary: Wrong result in CTE query due to CTE is treated as init
plan by planner and evaluated multiple times
Key: HAWQ-830
URL: https://issues.apache.org/jira/browse/HAWQ-830
Project: Apache HAWQ
Issue Type: Bug
Components: Optimizer
Reporter: Ruilong Huo
Assignee: Amr El-Helw
In CTE query, if the CTE itself is referenced multiple times, it should be
evaluated only once and then be used multiple time. However, it is treated as
init plan and evaluated multiple times in hawq 1.x and 2.0. This has two issues
here:
1. If the query in CTE is "volatile" (i.e., select volatile function) or has
side effect (create/drop object in database), it may generate wrong result
2. The performance of the query is not so efficient since the query in CTE is
evaluated multiple times.
Here is the steps to reproduce:
1) in hawq, CTE is treated as init plan and evaluated 2 times. Thus, the result
is incorrect
{noformat}
WITH r AS (SELECT random())
SELECT r1.*, r2.*
FROM r AS r1, r AS r2;
random | random
-------------------+-------------------
0.519145511090755 | 0.751198637764901
(1 row)
EXPLAIN
WITH r AS (SELECT random())
SELECT r1.*, r2.*
FROM r AS r1, r AS r2;
QUERY PLAN
--------------------------------------------------------------
Nested Loop (cost=0.04..0.77 rows=20 width=16)
-> Result (cost=0.01..0.02 rows=1 width=0)
InitPlan
-> Result (cost=0.00..0.01 rows=1 width=0)
-> Materialize (cost=0.03..0.09 rows=6 width=8)
-> Result (cost=0.01..0.02 rows=1 width=0)
InitPlan
-> Result (cost=0.00..0.01 rows=1 width=0)
Settings: default_hash_table_bucket_number=6
Optimizer status: legacy query optimizer
(10 rows)
{noformat}
2) in postgres, CTE is treated as CTE scan and evaluated 1 time. Thus, the
result is i
{noformat}
WITH r AS (SELECT random())
SELECT r1.*, r2.*
FROM r AS r1, r AS r2;
random | random
-------------------+-------------------
0.989214501809329 | 0.989214501809329
(1 row)
EXPLAIN
WITH r AS (SELECT random())
SELECT r1.*, r2.*
FROM r AS r1, r AS r2;
QUERY PLAN
----------------------------------------------------------
Nested Loop (cost=0.01..0.06 rows=1 width=16)
CTE r
-> Result (cost=0.00..0.01 rows=1 width=0)
-> CTE Scan on r r1 (cost=0.00..0.02 rows=1 width=8)
-> CTE Scan on r r2 (cost=0.00..0.02 rows=1 width=8)
(5 rows){noformat}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)