[ 
https://issues.apache.org/jira/browse/HAWQ-348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lei Chang updated HAWQ-348:
---------------------------
    Fix Version/s: backlog

> Optimizer (ORCA/Planner) should not preprocess (table) functions at planning 
> phase
> ----------------------------------------------------------------------------------
>
>                 Key: HAWQ-348
>                 URL: https://issues.apache.org/jira/browse/HAWQ-348
>             Project: Apache HAWQ
>          Issue Type: Bug
>          Components: Optimizer
>            Reporter: Ruilong Huo
>            Assignee: Amr El-Helw
>             Fix For: backlog
>
>
> Optimizer (ORCA/Planner) currently preprocesses (table) functions (either in 
> target list or from clause) at planning phase. This introduces:
> 1. Much lower performance since the result of the (table) function is 
> motioned to QD from QEs after the preprocessing, and it is further processed 
> there at QD, especially the result is large. In this case, QD does heavy 
> workload and becomes the bottleneck. It shows about 20x performance 
> difference in below example.
> 2. Much more memory overhead at QD since it needs to hold the result of the 
> (table) function. This is risky since the result might be unpredictably large.
> Here are the steps to reproduce this issue, as well as some initial analysis:
> Step 1: Prepare schema and data
> {noformat}
> CREATE TABLE t (id INT);
> CREATE TABLE
> INSERT INTO t SELECT generate_series(1, 100000000);
> INSERT 0 100000000
> CREATE OR REPLACE FUNCTION get_t()
> RETURNS SETOF t
> LANGUAGE SQL AS
> 'SELECT * FROM t'
> STABLE;
> CREATE FUNCTION
> {noformat}
> 2. With optimizer = OFF (Planner)
> {noformat}
> SET optimizer='OFF';
> SET
> select sum(id) from t;
>        sum
> ------------------
>  5000000050000000
> (1 row)
> Time: 8801.577 ms
> select sum(id) from get_t();
>        sum
> ------------------
>  5000000050000000
> (1 row)
> Time: 189992.273 ms
> EXPLAIN SELECT sum(id) FROM get_t();
>                              QUERY PLAN
> --------------------------------------------------------------------
>  Aggregate  (cost=32.50..32.51 rows=1 width=8)
>    ->  Function Scan on get_t  (cost=0.00..12.50 rows=8000 width=4)
>  Settings:  default_segment_num=8; optimizer=off
>  Optimizer status: legacy query optimizer
> (4 rows)
> {noformat}
> 3. With optimizer = ON (ORCA)
> {noformat}
> SET optimizer='ON';
> SET
> select sum(id) from t;
>        sum
> ------------------
>  5000000050000000
> (1 row)
> Time: 10103.436 ms
> select sum(id) from get_t();
>        sum
> ------------------
>  5000000050000000
> (1 row)
> Time: 195551.740 ms
> EXPLAIN SELECT sum(id) FROM get_t();
>                              QUERY PLAN
> --------------------------------------------------------------------
>  Aggregate  (cost=32.50..32.51 rows=1 width=8)
>    ->  Function Scan on get_t  (cost=0.00..12.50 rows=8000 width=4)
>  Settings:  default_segment_num=8
>  Optimizer status: legacy query optimizer
> (4 rows)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to