Paul Guo created HAWQ-1183:
------------------------------

             Summary: Writable external table with Hash distribution shows slow 
performance
                 Key: HAWQ-1183
                 URL: https://issues.apache.org/jira/browse/HAWQ-1183
             Project: Apache HAWQ
          Issue Type: Bug
            Reporter: Paul Guo
            Assignee: Lei Chang


Steps:

1. Create tables and populate them.
drop table tbl1;
drop external table ext_tbl1;
drop external table ext_tbl1_random;

CREATE TABLE tbl1 (a int, b text) DISTRIBUTED BY (a);
INSERT INTO tbl1 VALUES (generate_series(1,1000),'aaa');
INSERT INTO tbl1 VALUES (generate_series(1,10000),'bbb');
INSERT INTO tbl1 VALUES (generate_series(1,100000),'bbc');
INSERT INTO tbl1 VALUES (generate_series(1,1000000),'bdbc');
INSERT INTO tbl1 VALUES (generate_series(1,1000000),'bdddbc');

CREATE WRITABLE EXTERNAL TABLE ext_tbl1
( LIKE tbl1 )
LOCATION ('gpfdist://127.0.0.1/tbl1.csv')
FORMAT 'CSV' (DELIMITER ',')
DISTRIBUTED BY (a);

CREATE WRITABLE EXTERNAL TABLE ext_tbl1_random
( LIKE tbl1 )
LOCATION ('gpfdist://127.0.0.1/tbl1.random.csv')
FORMAT 'CSV' (DELIMITER ',')
DISTRIBUTED RANDOMLY;

2. Write the two external tables. We can find that the external table with hash 
distribution is slow with inserting, and plan shows that it has 1 workers only.

postgres=# explain analyze INSERT INTO ext_tbl1 SELECT * from tbl1;

                            QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------
 Insert  (cost=0.00..509.20 rows=1000 width=8)
   Rows out:  Avg 2111000.0 rows x 1 workers.  
Max/Last(seg0:host67/seg0:host67) 2111000/2111000 rows with 70/70 ms to first 
row, 20304/20304 ms to end, start offset by 30/30 ms.
   Executor memory:  1K bytes.
   ->  Result  (cost=0.00..431.07 rows=1000 width=20)
         Rows out:  Avg 2111000.0 rows x 1 workers.  
Max/Last(seg0:host67/seg0:host67) 2111000/2111000 rows with 61/61 ms to first 
row, 2034/2034 ms to end, start offset by 30/30 ms
.
         ->  Redistribute Motion 1:1  (slice1; segments: 1)  (cost=0.00..431.05 
rows=1000 width=8)
               Hash Key: tbl1.a
               Rows out:  Avg 2111000.0 rows x 1 workers at destination.  
Max/Last(seg0:host67/seg0:host67) 2111000/2111000 rows with 61/61 ms to first 
row, 1370/1370 ms to end, sta
rt offset by 30/30 ms.
               ->  Table Scan on tbl1  (cost=0.00..431.01 rows=1000 width=8)
                     Rows out:  Avg 2111000.0 rows x 1 workers.  
Max/Last(seg0:host67/seg0:host67) 2111000/2111000 rows with 61/61 ms to first 
row, 566/566 ms to end, start offset b
y 30/30 ms.
 Slice statistics:
   (slice0)    Executor memory: 293K bytes (seg0:host67).
   (slice1)    Executor memory: 303K bytes (seg0:host67).
 Statement statistics:
   Memory used: 262144K bytes
 Optimizer status: PQO version 1.684
 Dispatcher statistics:
   executors used(total/cached/new connection): (2/0/2); dispatcher 
time(total/connection/dispatch data): (17.095 ms/16.477 ms/0.053 ms).
   dispatch data time(max/min/avg): (0.027 ms/0.025 ms/0.026 ms); consume 
executor data time(max/min/avg): (0.051 ms/0.043 ms/0.047 ms); free executor 
time(max/min/avg): (0.000 ms/0
.000 ms/0.000 ms).
 Data locality statistics:
   data locality ratio: 1.000; virtual segment number: 1; different host 
number: 1; virtual segment number per host(avg/min/max): (1/1/1); segment 
size(avg/min/max): (46023656.000 B
/46023656 B/46023656 B); segment size with penalty(avg/min/max): (46023656.000 
B/46023656 B/46023656 B); continuity(avg/min/max): (1.000/1.000/1.000); DFS 
metadatacache: 46.181 ms;
resource allocation: 1.837 ms; datalocality calculation: 1.180 ms.
 Total runtime: 20538.524 ms
(22 rows)


postgres=# explain analyze INSERT INTO ext_tbl1 SELECT * from tbl1;

                 QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
--------------------------------------------
 Insert  (cost=0.00..444.03 rows=167 width=8)
   Rows out:  Avg 351833.3 rows x 6 workers.  Max/Last(seg5:host67/seg0:host67) 
351916/351849 rows with 89/171 ms to first row, 4074/4209 ms to end, start 
offset by 46/45 ms.
   Executor memory:  1K bytes avg, 1K bytes max (seg5:host67).
   ->  Result  (cost=0.00..431.01 rows=167 width=20)
         Rows out:  Avg 351833.3 rows x 6 workers.  
Max/Last(seg5:host67/seg0:host67) 351916/351849 rows with 77/148 ms to first 
row, 292/392 ms to end, start offset by 46/45 ms.
         ->  Table Scan on tbl1  (cost=0.00..431.00 rows=167 width=8)
               Rows out:  Avg 351833.3 rows x 6 workers.  
Max/Last(seg5:host67/seg2:host67) 351916/351855 rows with 77/152 ms to first 
row, 158/257 ms to end, start offset by 46/42
ms.
 Slice statistics:
   (slice0)    Executor memory: 280K bytes avg x 6 workers, 280K bytes max 
(seg5:host67).
 Statement statistics:
   Memory used: 262144K bytes
 Optimizer status: PQO version 1.684
 Dispatcher statistics:
   executors used(total/cached/new connection): (6/0/6); dispatcher 
time(total/connection/dispatch data): (38.288 ms/37.708 ms/0.078 ms).
   dispatch data time(max/min/avg): (0.028 ms/0.004 ms/0.012 ms); consume 
executor data time(max/min/avg): (0.067 ms/0.014 ms/0.029 ms); free executor 
time(max/min/avg): (0.000 ms/0
.000 ms/0.000 ms).
 Data locality statistics:
   data locality ratio: 1.000; virtual segment number: 6; different host 
number: 1; virtual segment number per host(avg/min/max): (6/6/6); segment 
size(avg/min/max): (7670609.333 B/
7668464 B/7672344 B); segment size with penalty(avg/min/max): (0.000 B/0 B/0 
B); continuity(avg/min/max): (1.000/1.000/1.000); DFS metadatacache: 28.855 ms; 
resource allocation: 12.
933 ms; datalocality calculation: 0.190 ms.
 Total runtime: 4333.663 ms
(18 rows)




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to