[ 
https://issues.apache.org/jira/browse/HAWQ-800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15326423#comment-15326423
 ] 

Ruilong Huo edited comment on HAWQ-800 at 6/12/16 12:44 PM:
------------------------------------------------------------

Per investigation and discussion with Hubert and Ming, this is a bug introduced 
in prepare statement (prepare either in udf or in sql statement) in hawq 2.0.

The root cause is that there are two phases in prepared statement: phase 1) 
plan + metadata + datalocality is prepared and dispatched only once; phase 2) 
every time the plan is executed, it use the same datalocality.

For example, table t (id int) have 1 tuple at beginning, if we do below for 
loop with "insert into t select * from t" as prepared statement, it gives below 
result:
{noformat}
phase                                         content in table t               
hawq 2.0           hawq 1.x                     
---------------------------------------------------------------------------------------------------------------------------------
beginning                                           i = 0                 {1}   
                   {1}                                           
for i = 1 ~ 3: do "insert into t select * from t"   i = 1              {1, 1}   
               {1, 1}
                                               i = 2                {1, 1, 1}   
           {1, 1, 1, 1}
                                               i = 3                {1, 1, 1, 
1}          {1, 1, 1, 1, 1, 1, 1, 1}
{noformat}
To be specific, every time it execute the "insert into select", it uses the 
initial datalocality (i.e., initial tuple {1} in table t) hawq 2.0; while in 
hawq 1.x, every segment read all its latest data and do the insert.

To keep the prepared statement feature, we propose below fix:
1. Separate the dispatching into two separate part: 1) plan+metadata, which is 
the same as 1.x; 2) datalocality + resource_negotiator
2. Every time we dispatch prepared plan + metadata in 1) for execution, we 
re-calculate the datalocality and re-do the resource negotiation, and then 
dispatch it along with 1).
3. If the resource (i.e., vseg number) change in above step 2, we re-prepare 
and then cache the new plan

PS: note that for prepared statement in UDF, we use a fixed number of vsegs; 
while in prepared statement in SQL, data size change may incur change of 
resource (i.e., vseg number) and thus the plan.


was (Author: huor):
Per investigation and discussion with Hubert and Ming, this is a bug introduced 
in prepare statement (prepare either in udf or in sql statement) in hawq 2.0.

The root cause is that there are two phases in prepared statement: phase 1) 
plan + metadata + datalocality is prepared and dispatched only once; phase 2) 
every time the plan is executed, it use the same datalocality.

For example, table t (id int) have 1 tuple at beginning, if we do below for 
loop with "insert into t select * from t" as prepared statement, it gives below 
result:
{noformat}
phase                                         content in table t               
hawq 2.0           hawq 1.x                     
---------------------------------------------------------------------------------------------------------------------------------
beginning                                          i = 0                 {1}    
                  {1}                                           
for i = 1 ~ 3: do "insert into t select * from t"   i = 1              {1, 1}   
               {1, 1}
                                          i = 2                {1, 1, 1}        
      {1, 1, 1, 1}
                                          i = 3                {1, 1, 1, 1}     
     {1, 1, 1, 1, 1, 1, 1, 1}
{noformat}
To be specific, every time it execute the "insert into select", it uses the 
initial datalocality (i.e., initial tuple {1} in table t) hawq 2.0; while in 
hawq 1.x, every segment read all its latest data and do the insert.

To keep the prepared statement feature, we propose below fix:
1. Separate the dispatching into two separate part: 1) plan+metadata, which is 
the same as 1.x; 2) datalocality + resource_negotiator
2. Every time we dispatch prepared plan + metadata in 1) for execution, we 
re-calculate the datalocality and re-do the resource negotiation, and then 
dispatch it along with 1).
3. If the resource (i.e., vseg number) change in above step 2, we re-prepare 
and then cache the new plan

PS: note that for prepared statement in UDF, we use a fixed number of vsegs; 
while in prepared statement in SQL, data size change may incur change of 
resource (i.e., vseg number) and thus the plan.

> Less tuple is inserted due to data locality information is not refreshed and 
> dispatched in prepared statement
> -------------------------------------------------------------------------------------------------------------
>
>                 Key: HAWQ-800
>                 URL: https://issues.apache.org/jira/browse/HAWQ-800
>             Project: Apache HAWQ
>          Issue Type: Bug
>          Components: Core, Dispatcher, Query Execution, Resource Manager
>            Reporter: Ruilong Huo
>            Assignee: Ruilong Huo
>         Attachments: proba.out, proba.sql, proba_execute.out, 
> proba_execute.sql
>
>
> In either explicit (SQL) or implicit (UDF) prepared statement, there is less 
> tuple inserted if we run a prepared "insert into t select * from t" plan 
> multiple times in a transaction.
> Below is a simple case to reproduce this issue. For a more complicated 
> example, you may refer to attached proba_execute and proba.
> 1. There should be 8 tuples, however there is only 4 in hawq 2.0
> {noformat}
> drop table if exists t;
> DROP TABLE
> create table t (id int);
> CREATE TABLE
> insert into t values (1);
> INSERT 0 1
> CREATE OR REPLACE FUNCTION f_load()
> RETURNS TEXT
> LANGUAGE plpgsql
> AS
> $body$
> DECLARE
>     l_rec RECORD;
>     l_itm RECORD;
> BEGIN
>     FOR l_rec IN ( SELECT generate_series(1, 3) AS id )
>     LOOP
>         INSERT INTO t SELECT * FROM t;
>     END LOOP;
>     RETURN 'done';
> END;
> $body$
> ;
> CREATE FUNCTION
> SELECT f_load();
>  f_load
> --------
>  done
> (1 row)
> SELECT * FROM t;
>  id
> ----
>   1
>   1
>   1
>   1
> (4 rows)
> {noformat}
> 2. There are 8 tuples as expected in hawq 1.x
> {noformat}
> drop table if exists t;
> DROP TABLE
> create table t (id int);
> psql:temp.sql:3: NOTICE:  Table doesn't have 'DISTRIBUTED BY' clause -- Using 
> column named 'id' as the Greenplum Database data distribution key for this 
> table.
> HINT:  The 'DISTRIBUTED BY' clause determines the distribution of data. Make 
> sure column(s) chosen are the optimal data distribution key to minimize skew.
> CREATE TABLE
> insert into t values (1);
> INSERT 0 1
> CREATE OR REPLACE FUNCTION f_load()
> RETURNS TEXT
> LANGUAGE plpgsql
> AS
> $body$
> DECLARE
>     l_rec RECORD;
>     l_itm RECORD;
> BEGIN
>     FOR l_rec IN ( SELECT generate_series(1, 3) AS id )
>     LOOP
>         INSERT INTO t SELECT * FROM t;
>     END LOOP;
>     RETURN 'done';
> END;
> $body$
> ;
> CREATE FUNCTION
> SELECT f_load();
>  f_load
> --------
>  done
> (1 row)
> SELECT * FROM t;
>  id
> ----
>   1
>   1
>   1
>   1
>   1
>   1
>   1
>   1
> (8 rows)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to