[
https://issues.apache.org/jira/browse/HAWQ-348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Lei Chang updated HAWQ-348:
---------------------------
Fix Version/s: backlog
> Optimizer (ORCA/Planner) should not preprocess (table) functions at planning
> phase
> ----------------------------------------------------------------------------------
>
> Key: HAWQ-348
> URL: https://issues.apache.org/jira/browse/HAWQ-348
> Project: Apache HAWQ
> Issue Type: Bug
> Components: Optimizer
> Reporter: Ruilong Huo
> Assignee: Amr El-Helw
> Fix For: backlog
>
>
> Optimizer (ORCA/Planner) currently preprocesses (table) functions (either in
> target list or from clause) at planning phase. This introduces:
> 1. Much lower performance since the result of the (table) function is
> motioned to QD from QEs after the preprocessing, and it is further processed
> there at QD, especially the result is large. In this case, QD does heavy
> workload and becomes the bottleneck. It shows about 20x performance
> difference in below example.
> 2. Much more memory overhead at QD since it needs to hold the result of the
> (table) function. This is risky since the result might be unpredictably large.
> Here are the steps to reproduce this issue, as well as some initial analysis:
> Step 1: Prepare schema and data
> {noformat}
> CREATE TABLE t (id INT);
> CREATE TABLE
> INSERT INTO t SELECT generate_series(1, 100000000);
> INSERT 0 100000000
> CREATE OR REPLACE FUNCTION get_t()
> RETURNS SETOF t
> LANGUAGE SQL AS
> 'SELECT * FROM t'
> STABLE;
> CREATE FUNCTION
> {noformat}
> 2. With optimizer = OFF (Planner)
> {noformat}
> SET optimizer='OFF';
> SET
> select sum(id) from t;
> sum
> ------------------
> 5000000050000000
> (1 row)
> Time: 8801.577 ms
> select sum(id) from get_t();
> sum
> ------------------
> 5000000050000000
> (1 row)
> Time: 189992.273 ms
> EXPLAIN SELECT sum(id) FROM get_t();
> QUERY PLAN
> --------------------------------------------------------------------
> Aggregate (cost=32.50..32.51 rows=1 width=8)
> -> Function Scan on get_t (cost=0.00..12.50 rows=8000 width=4)
> Settings: default_segment_num=8; optimizer=off
> Optimizer status: legacy query optimizer
> (4 rows)
> {noformat}
> 3. With optimizer = ON (ORCA)
> {noformat}
> SET optimizer='ON';
> SET
> select sum(id) from t;
> sum
> ------------------
> 5000000050000000
> (1 row)
> Time: 10103.436 ms
> select sum(id) from get_t();
> sum
> ------------------
> 5000000050000000
> (1 row)
> Time: 195551.740 ms
> EXPLAIN SELECT sum(id) FROM get_t();
> QUERY PLAN
> --------------------------------------------------------------------
> Aggregate (cost=32.50..32.51 rows=1 width=8)
> -> Function Scan on get_t (cost=0.00..12.50 rows=8000 width=4)
> Settings: default_segment_num=8
> Optimizer status: legacy query optimizer
> (4 rows)
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)