Hi, HAWQ Developers: As my previous email hint, I run TPC-DS test on our development. Comparing with previous version 1.3.x, we can see the performance improvement on most of queries.
But the problem is performance reduction for *some* queries. For example, the query64, the running time increase from 10754.688 ms to 68884.731 ms . I am not sure if any changes were made that increase the running time? In order to discuss the detail about this issue, I would like use the query10. The running time increase from 1795.746 ms to 744919.251 ms. I also attache the sql about this query, and the query plan for this query. Thanks
PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=749941403.24..749941406.76 rows=38 width=208)
-> Gather Motion 40:1 (slice10; segments: 40)
(cost=749941403.24..749941406.76 rows=38 width=208)
Merge Key: partial_aggregation.cd_gender,
partial_aggregation.cd_marital_status, partial_aggregation.cd_education_status,
partial_aggregation.cd_purchase_estimate, partial_aggregation.cd_credit_rating,
partial_aggregation.cd_dep_count, partial_aggregation.cd_dep_employed_count,
partial_aggregation.cd_dep_college_count
-> Limit (cost=749941403.24..749941406.00 rows=1 width=208)
-> GroupAggregate (cost=749941403.24..749941406.00 rows=1
width=208)
Group By: customer_demographics.cd_gender,
customer_demographics.cd_marital_status,
customer_demographics.cd_education_status,
customer_demographics.cd_purchase_estimate,
customer_demographics.cd_credit_rating, customer_demographics.cd_dep_count,
customer_demographics.cd_dep_employed_count,
customer_demographics.cd_dep_college_count
-> Sort (cost=749941403.24..749941403.34 rows=1
width=168)
Sort Key: customer_demographics.cd_gender,
customer_demographics.cd_marital_status,
customer_demographics.cd_education_status,
customer_demographics.cd_purchase_estimate,
customer_demographics.cd_credit_rating, customer_demographics.cd_dep_count,
customer_demographics.cd_dep_employed_count,
customer_demographics.cd_dep_college_count
-> Redistribute Motion 40:40 (slice9; segments:
40) (cost=749941399.09..749941402.25 rows=1 width=168)
Hash Key: customer_demographics.cd_gender,
customer_demographics.cd_marital_status,
customer_demographics.cd_education_status,
customer_demographics.cd_purchase_estimate,
customer_demographics.cd_credit_rating, customer_demographics.cd_dep_count,
customer_demographics.cd_dep_employed_count,
customer_demographics.cd_dep_college_count
-> GroupAggregate
(cost=749941399.09..749941401.49 rows=1 width=168)
Group By:
customer_demographics.cd_gender, customer_demographics.cd_marital_status,
customer_demographics.cd_education_status,
customer_demographics.cd_purchase_estimate,
customer_demographics.cd_credit_rating, customer_demographics.cd_dep_count,
customer_demographics.cd_dep_employed_count,
customer_demographics.cd_dep_college_count
-> Sort
(cost=749941399.09..749941399.18 rows=1 width=52)
Sort Key:
customer_demographics.cd_gender, customer_demographics.cd_marital_status,
customer_demographics.cd_education_status,
customer_demographics.cd_purchase_estimate,
customer_demographics.cd_credit_rating, customer_demographics.cd_dep_count,
customer_demographics.cd_dep_employed_count,
customer_demographics.cd_dep_college_count
-> Hash Join
(cost=51836.73..749941398.07 rows=1 width=52)
Hash Cond:
c.c_current_cdemo_sk = customer_demographics.cd_demo_sk
-> Redistribute Motion
40:40 (slice8; segments: 40) (cost=47445.93..749937006.69 rows=1 width=4)
Hash Key:
c.c_current_cdemo_sk
-> Hash Join
(cost=47445.93..749937005.89 rows=1 width=4)
Hash Cond:
c.c_current_addr_sk = ca.ca_address_sk
-> Redistribute
Motion 40:40 (slice7; segments: 40) (cost=46548.96..749936094.44 rows=140
width=8)
Hash Key:
c.c_current_addr_sk
-> Hash
EXISTS Join (cost=46548.96..749935982.60 rows=140 width=8)
Hash
Cond: c.c_customer_sk = store_sales.ss_customer_sk
->
Parquet table Scan on customer c (cost=0.00..749889176.24 rows=1875 width=12)
Filter: ((subplan)) OR ((subplan))
SubPlan 2
-> Hash Join (cost=1417.94..25303.89 rows=1 width=315)
Hash Cond: catalog_sales.cs_sold_date_sk = public.date_dim.d_date_sk
-> Result (cost=23885.37..23885.60 rows=1 width=197)
Filter: $0 = catalog_sales.cs_ship_customer_sk
-> Materialize (cost=23885.37..23885.60 rows=1 width=197)
-> Broadcast Motion 40:40 (slice3; segments: 40)
(cost=0.00..23885.35 rows=1 width=197)
-> Parquet table Scan on catalog_sales
(cost=0.00..23885.35 rows=1 width=197)
-> Hash (cost=1415.36..1415.36 rows=6 width=118)
-> Result (cost=1415.56..1417.63 rows=6 width=118)
-> Materialize (cost=1415.56..1417.63 rows=6 width=118)
-> Broadcast Motion 40:40 (slice4; segments: 40)
(cost=0.00..1415.36 rows=6 width=118)
-> Parquet table Scan on date_dim
(cost=0.00..1415.36 rows=6 width=118)
Filter: d_year = 2000 AND d_moy >= 2 AND
d_moy <= 5
SubPlan 1
-> Hash Join (cost=1417.94..13400.73 rows=1 width=315)
Hash Cond: web_sales.ws_sold_date_sk = public.date_dim.d_date_sk
-> Result (cost=11982.32..11982.50 rows=1 width=197)
Filter: $0 = web_sales.ws_bill_customer_sk
-> Materialize (cost=11982.32..11982.50 rows=1 width=197)
-> Broadcast Motion 40:40 (slice1; segments: 40)
(cost=0.00..11982.30 rows=1 width=197)
-> Parquet table Scan on web_sales
(cost=0.00..11982.30 rows=1 width=197)
-> Hash (cost=1415.36..1415.36 rows=6 width=118)
-> Result (cost=1415.56..1417.63 rows=6 width=118)
-> Materialize (cost=1415.56..1417.63 rows=6 width=118)
-> Broadcast Motion 40:40 (slice2; segments: 40)
(cost=0.00..1415.36 rows=6 width=118)
-> Parquet table Scan on date_dim
(cost=0.00..1415.36 rows=6 width=118)
Filter: d_year = 2000 AND d_moy >= 2 AND
d_moy <= 5
->
Hash (cost=46451.56..46451.56 rows=195 width=4)
-> Redistribute Motion 40:40 (slice6; segments: 40) (cost=1603.56..46451.56
rows=195 width=4)
Hash Key: store_sales.ss_customer_sk
-> Hash EXISTS Join (cost=1603.56..46295.72 rows=195 width=4)
Hash Cond: store_sales.ss_sold_date_sk = public.date_dim.d_date_sk
-> Parquet table Scan on store_sales (cost=0.00..36634.04
rows=72011 width=8)
-> Hash (cost=1500.15..1500.15 rows=207 width=4)
-> Broadcast Motion 40:40 (slice5; segments: 40)
(cost=0.00..1500.15 rows=207 width=4)
-> Parquet table Scan on date_dim
(cost=0.00..1415.36 rows=6 width=4)
Filter: d_year = 2000 AND d_moy >= 2 AND d_moy
<= 5
-> Hash
(cost=892.50..892.50 rows=9 width=4)
->
Parquet table Scan on customer_address ca (cost=0.00..892.50 rows=9 width=4)
Filter: ca_county::text = ANY ('{"Yellowstone County","Montgomery
County","Divide County","Cedar County","Manassas Park city"}'::text[])
-> Hash
(cost=1989.80..1989.80 rows=4802 width=56)
-> Parquet table Scan
on customer_demographics (cost=0.00..1989.80 rows=4802 width=56)
Settings: default_segment_num=160
PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.00..4259.89 rows=20 width=100)
-> Gather Motion 5:1 (slice10; segments: 5) (cost=0.00..4259.88 rows=100
width=100)
Merge Key: customer_demographics.cd_gender,
customer_demographics.cd_marital_status,
customer_demographics.cd_education_status,
customer_demographics.cd_purchase_estimate,
customer_demographics.cd_credit_rating, customer_demographics.cd_dep_count,
customer_demographics.cd_dep_employed_count,
customer_demographics.cd_dep_college_count
-> Limit (cost=0.00..4259.85 rows=20 width=100)
-> GroupAggregate (cost=0.00..4259.85 rows=88 width=100)
Group By: customer_demographics.cd_gender,
customer_demographics.cd_marital_status,
customer_demographics.cd_education_status,
customer_demographics.cd_purchase_estimate,
customer_demographics.cd_credit_rating, customer_demographics.cd_dep_count,
customer_demographics.cd_dep_employed_count,
customer_demographics.cd_dep_college_count
-> Sort (cost=0.00..4259.84 rows=88 width=100)
Sort Key: customer_demographics.cd_gender,
customer_demographics.cd_marital_status,
customer_demographics.cd_education_status,
customer_demographics.cd_purchase_estimate,
customer_demographics.cd_credit_rating, customer_demographics.cd_dep_count,
customer_demographics.cd_dep_employed_count,
customer_demographics.cd_dep_college_count
-> Redistribute Motion 5:5 (slice9; segments: 5)
(cost=0.00..4259.52 rows=88 width=100)
Hash Key: customer_demographics.cd_gender,
customer_demographics.cd_marital_status,
customer_demographics.cd_education_status,
customer_demographics.cd_purchase_estimate,
customer_demographics.cd_credit_rating, customer_demographics.cd_dep_count,
customer_demographics.cd_dep_employed_count,
customer_demographics.cd_dep_college_count
-> Result (cost=0.00..4259.49 rows=88
width=100)
-> HashAggregate (cost=0.00..4259.49
rows=88 width=100)
Group By:
customer_demographics.cd_gender, customer_demographics.cd_marital_status,
customer_demographics.cd_education_status,
customer_demographics.cd_purchase_estimate,
customer_demographics.cd_credit_rating, customer_demographics.cd_dep_count,
customer_demographics.cd_dep_employed_count,
customer_demographics.cd_dep_college_count
-> Hash Join (cost=0.00..4250.43
rows=8406 width=52)
Hash Cond:
customer_demographics.cd_demo_sk = customer.c_current_cdemo_sk
-> Table Scan on
customer_demographics (cost=0.00..432.48 rows=38416 width=56)
-> Hash
(cost=3802.83..3802.83 rows=8406 width=4)
-> Redistribute
Motion 5:5 (slice8; segments: 5) (cost=0.00..3802.83 rows=8406 width=4)
Hash Key:
customer.c_current_cdemo_sk
-> Hash Join
(cost=0.00..3802.72 rows=8406 width=4)
Hash Cond:
customer.c_current_addr_sk = customer_address.ca_address_sk
->
Redistribute Motion 5:5 (slice7; segments: 5) (cost=0.00..3366.77 rows=15264
width=8)
Hash
Key: customer.c_current_addr_sk
->
Result (cost=0.00..3366.38 rows=15264 width=8)
Filter: COALESCE("ColRef_0259", 0::bigint) > 0::bigint OR
COALESCE("ColRef_0260", 0::bigint) > 0::bigint
-> Result (cost=0.00..3365.28 rows=16719 width=24)
-> Hash Left Join (cost=0.00..3364.88 rows=16719 width=24)
Hash Cond: customer.c_customer_sk =
catalog_sales.cs_ship_customer_sk
-> Hash Left Join (cost=0.00..2396.90 rows=11789 width=20)
Hash Cond: customer.c_customer_sk =
web_sales.ws_bill_customer_sk
-> Result (cost=0.00..1479.35 rows=8844 width=12)
Filter: COALESCE("ColRef_0258", 0::bigint) > 0::bigint
-> Result (cost=0.00..1478.62 rows=22108 width=20)
-> Hash Left Join (cost=0.00..1478.18
rows=22108 width=20)
Hash Cond: customer.c_customer_sk =
store_sales.ss_customer_sk
-> Table Scan on customer
(cost=0.00..433.23 rows=20000 width=12)
-> Hash (cost=1034.28..1034.28
rows=13568 width=12)
-> HashAggregate
(cost=0.00..1034.28 rows=13568 width=12)
Group By:
store_sales.ss_customer_sk
-> Redistribute Motion 5:5
(slice2; segments: 5) (cost=0.00..1032.54 rows=13568 width=12)
Hash Key:
store_sales.ss_customer_sk
-> Result
(cost=0.00..1032.03 rows=13568 width=12)
-> HashAggregate
(cost=0.00..1032.03 rows=13568 width=12)
Group By:
store_sales.ss_customer_sk
-> Hash
Join (cost=0.00..1023.71 rows=66353 width=4)
Hash
Cond: store_sales.ss_sold_date_sk = public.date_dim.d_date_sk
->
Table Scan on store_sales (cost=0.00..478.21 rows=576081 width=8)
->
Hash (cost=433.03..433.03 rows=211 width=4)
-> Broadcast Motion 5:5 (slice1; segments: 5) (cost=0.00..433.03 rows=211
width=4)
-> Table Scan on date_dim (cost=0.00..433.02 rows=43 width=4)
Filter: d_year = 2000 AND d_moy >= 2 AND d_moy <= 5
-> Hash (cost=912.18..912.18 rows=7913 width=12)
-> HashAggregate (cost=0.00..912.18 rows=7913
width=12)
Group By: web_sales.ws_bill_customer_sk
-> Redistribute Motion 5:5 (slice4; segments:
5) (cost=0.00..911.16 rows=7913 width=12)
Hash Key: web_sales.ws_bill_customer_sk
-> Result (cost=0.00..910.86 rows=7913
width=12)
-> HashAggregate
(cost=0.00..910.86 rows=7913 width=12)
Group By:
web_sales.ws_bill_customer_sk
-> Hash Join
(cost=0.00..908.77 rows=16581 width=4)
Hash Cond:
web_sales.ws_sold_date_sk = public.date_dim.d_date_sk
-> Table Scan on
web_sales (cost=0.00..447.62 rows=143877 width=8)
-> Hash
(cost=433.03..433.03 rows=211 width=4)
-> Broadcast
Motion 5:5 (slice3; segments: 5) (cost=0.00..433.03 rows=211 width=4)
-> Table
Scan on date_dim (cost=0.00..433.02 rows=43 width=4)
Filter: d_year = 2000 AND d_moy >= 2 AND d_moy <= 5
-> Hash (cost=959.83..959.83 rows=12230 width=12)
-> HashAggregate (cost=0.00..959.83 rows=12230 width=12)
Group By: catalog_sales.cs_ship_customer_sk
-> Redistribute Motion 5:5 (slice6; segments: 5)
(cost=0.00..958.26 rows=12230 width=12)
Hash Key: catalog_sales.cs_ship_customer_sk
-> Result (cost=0.00..957.80 rows=12230
width=12)
-> HashAggregate (cost=0.00..957.80
rows=12230 width=12)
Group By:
catalog_sales.cs_ship_customer_sk
-> Hash Join (cost=0.00..953.64
rows=33081 width=4)
Hash Cond:
catalog_sales.cs_sold_date_sk = public.date_dim.d_date_sk
-> Table Scan on
catalog_sales (cost=0.00..464.30 rows=288310 width=8)
-> Hash (cost=433.03..433.03
rows=211 width=4)
-> Broadcast Motion 5:5
(slice5; segments: 5) (cost=0.00..433.03 rows=211 width=4)
-> Table Scan on
date_dim (cost=0.00..433.02 rows=43 width=4)
Filter:
d_year = 2000 AND d_moy >= 2 AND d_moy <= 5
-> Hash
(cost=432.35..432.35 rows=4000 width=4)
->
Table Scan on customer_address (cost=0.00..432.35 rows=4000 width=4)
Filter: ca_county::text = ANY ('{"Yellowstone County","Montgomery
County","Divide County","Cedar County","Manassas Park city"}'::character
varying[]::text[])
query10.sql
Description: application/sql
