Hi, HAWQ Developers:

   As my previous email hint, I run TPC-DS test on our development.
Comparing with previous version 1.3.x, we can see the performance
improvement on most of queries.

   But the problem is performance reduction for *some* queries. For
example, the query64, the running time increase from 10754.688 ms
to 68884.731 ms . I am not sure if any changes were made that increase the
running time?

   In order to discuss the detail about this issue, I would like use the
query10. The running time increase from 1795.746 ms to 744919.251 ms. I
also attache the sql about this query, and the query plan for this query.

   Thanks
PLAN                                                                            
                                                                                
                             
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=749941403.24..749941406.76 rows=38 width=208)
   ->  Gather Motion 40:1  (slice10; segments: 40)  
(cost=749941403.24..749941406.76 rows=38 width=208)
         Merge Key: partial_aggregation.cd_gender, 
partial_aggregation.cd_marital_status, partial_aggregation.cd_education_status, 
partial_aggregation.cd_purchase_estimate, partial_aggregation.cd_credit_rating, 
partial_aggregation.cd_dep_count, partial_aggregation.cd_dep_employed_count, 
partial_aggregation.cd_dep_college_count
         ->  Limit  (cost=749941403.24..749941406.00 rows=1 width=208)
               ->  GroupAggregate  (cost=749941403.24..749941406.00 rows=1 
width=208)
                     Group By: customer_demographics.cd_gender, 
customer_demographics.cd_marital_status, 
customer_demographics.cd_education_status, 
customer_demographics.cd_purchase_estimate, 
customer_demographics.cd_credit_rating, customer_demographics.cd_dep_count, 
customer_demographics.cd_dep_employed_count, 
customer_demographics.cd_dep_college_count
                     ->  Sort  (cost=749941403.24..749941403.34 rows=1 
width=168)
                           Sort Key: customer_demographics.cd_gender, 
customer_demographics.cd_marital_status, 
customer_demographics.cd_education_status, 
customer_demographics.cd_purchase_estimate, 
customer_demographics.cd_credit_rating, customer_demographics.cd_dep_count, 
customer_demographics.cd_dep_employed_count, 
customer_demographics.cd_dep_college_count
                           ->  Redistribute Motion 40:40  (slice9; segments: 
40)  (cost=749941399.09..749941402.25 rows=1 width=168)
                                 Hash Key: customer_demographics.cd_gender, 
customer_demographics.cd_marital_status, 
customer_demographics.cd_education_status, 
customer_demographics.cd_purchase_estimate, 
customer_demographics.cd_credit_rating, customer_demographics.cd_dep_count, 
customer_demographics.cd_dep_employed_count, 
customer_demographics.cd_dep_college_count
                                 ->  GroupAggregate  
(cost=749941399.09..749941401.49 rows=1 width=168)
                                       Group By: 
customer_demographics.cd_gender, customer_demographics.cd_marital_status, 
customer_demographics.cd_education_status, 
customer_demographics.cd_purchase_estimate, 
customer_demographics.cd_credit_rating, customer_demographics.cd_dep_count, 
customer_demographics.cd_dep_employed_count, 
customer_demographics.cd_dep_college_count
                                       ->  Sort  
(cost=749941399.09..749941399.18 rows=1 width=52)
                                             Sort Key: 
customer_demographics.cd_gender, customer_demographics.cd_marital_status, 
customer_demographics.cd_education_status, 
customer_demographics.cd_purchase_estimate, 
customer_demographics.cd_credit_rating, customer_demographics.cd_dep_count, 
customer_demographics.cd_dep_employed_count, 
customer_demographics.cd_dep_college_count
                                             ->  Hash Join  
(cost=51836.73..749941398.07 rows=1 width=52)
                                                   Hash Cond: 
c.c_current_cdemo_sk = customer_demographics.cd_demo_sk
                                                   ->  Redistribute Motion 
40:40  (slice8; segments: 40)  (cost=47445.93..749937006.69 rows=1 width=4)
                                                         Hash Key: 
c.c_current_cdemo_sk
                                                         ->  Hash Join  
(cost=47445.93..749937005.89 rows=1 width=4)
                                                               Hash Cond: 
c.c_current_addr_sk = ca.ca_address_sk
                                                               ->  Redistribute 
Motion 40:40  (slice7; segments: 40)  (cost=46548.96..749936094.44 rows=140 
width=8)
                                                                     Hash Key: 
c.c_current_addr_sk
                                                                     ->  Hash 
EXISTS Join  (cost=46548.96..749935982.60 rows=140 width=8)
                                                                           Hash 
Cond: c.c_customer_sk = store_sales.ss_customer_sk
                                                                           ->  
Parquet table Scan on customer c  (cost=0.00..749889176.24 rows=1875 width=12)
                                                                                
 Filter: ((subplan)) OR ((subplan))
                                                                                
 SubPlan 2
                                                                                
   ->  Hash Join  (cost=1417.94..25303.89 rows=1 width=315)
                                                                                
         Hash Cond: catalog_sales.cs_sold_date_sk = public.date_dim.d_date_sk
                                                                                
         ->  Result  (cost=23885.37..23885.60 rows=1 width=197)
                                                                                
               Filter: $0 = catalog_sales.cs_ship_customer_sk
                                                                                
               ->  Materialize  (cost=23885.37..23885.60 rows=1 width=197)
                                                                                
                     ->  Broadcast Motion 40:40  (slice3; segments: 40)  
(cost=0.00..23885.35 rows=1 width=197)
                                                                                
                           ->  Parquet table Scan on catalog_sales  
(cost=0.00..23885.35 rows=1 width=197)
                                                                                
         ->  Hash  (cost=1415.36..1415.36 rows=6 width=118)
                                                                                
               ->  Result  (cost=1415.56..1417.63 rows=6 width=118)
                                                                                
                     ->  Materialize  (cost=1415.56..1417.63 rows=6 width=118)
                                                                                
                           ->  Broadcast Motion 40:40  (slice4; segments: 40)  
(cost=0.00..1415.36 rows=6 width=118)
                                                                                
                                 ->  Parquet table Scan on date_dim  
(cost=0.00..1415.36 rows=6 width=118)
                                                                                
                                       Filter: d_year = 2000 AND d_moy >= 2 AND 
d_moy <= 5
                                                                                
 SubPlan 1
                                                                                
   ->  Hash Join  (cost=1417.94..13400.73 rows=1 width=315)
                                                                                
         Hash Cond: web_sales.ws_sold_date_sk = public.date_dim.d_date_sk
                                                                                
         ->  Result  (cost=11982.32..11982.50 rows=1 width=197)
                                                                                
               Filter: $0 = web_sales.ws_bill_customer_sk
                                                                                
               ->  Materialize  (cost=11982.32..11982.50 rows=1 width=197)
                                                                                
                     ->  Broadcast Motion 40:40  (slice1; segments: 40)  
(cost=0.00..11982.30 rows=1 width=197)
                                                                                
                           ->  Parquet table Scan on web_sales  
(cost=0.00..11982.30 rows=1 width=197)
                                                                                
         ->  Hash  (cost=1415.36..1415.36 rows=6 width=118)
                                                                                
               ->  Result  (cost=1415.56..1417.63 rows=6 width=118)
                                                                                
                     ->  Materialize  (cost=1415.56..1417.63 rows=6 width=118)
                                                                                
                           ->  Broadcast Motion 40:40  (slice2; segments: 40)  
(cost=0.00..1415.36 rows=6 width=118)
                                                                                
                                 ->  Parquet table Scan on date_dim  
(cost=0.00..1415.36 rows=6 width=118)
                                                                                
                                       Filter: d_year = 2000 AND d_moy >= 2 AND 
d_moy <= 5
                                                                           ->  
Hash  (cost=46451.56..46451.56 rows=195 width=4)
                                                                                
 ->  Redistribute Motion 40:40  (slice6; segments: 40)  (cost=1603.56..46451.56 
rows=195 width=4)
                                                                                
       Hash Key: store_sales.ss_customer_sk
                                                                                
       ->  Hash EXISTS Join  (cost=1603.56..46295.72 rows=195 width=4)
                                                                                
             Hash Cond: store_sales.ss_sold_date_sk = public.date_dim.d_date_sk
                                                                                
             ->  Parquet table Scan on store_sales  (cost=0.00..36634.04 
rows=72011 width=8)
                                                                                
             ->  Hash  (cost=1500.15..1500.15 rows=207 width=4)
                                                                                
                   ->  Broadcast Motion 40:40  (slice5; segments: 40)  
(cost=0.00..1500.15 rows=207 width=4)
                                                                                
                         ->  Parquet table Scan on date_dim  
(cost=0.00..1415.36 rows=6 width=4)
                                                                                
                               Filter: d_year = 2000 AND d_moy >= 2 AND d_moy 
<= 5
                                                               ->  Hash  
(cost=892.50..892.50 rows=9 width=4)
                                                                     ->  
Parquet table Scan on customer_address ca  (cost=0.00..892.50 rows=9 width=4)
                                                                           
Filter: ca_county::text = ANY ('{"Yellowstone County","Montgomery 
County","Divide County","Cedar County","Manassas Park city"}'::text[])
                                                   ->  Hash  
(cost=1989.80..1989.80 rows=4802 width=56)
                                                         ->  Parquet table Scan 
on customer_demographics  (cost=0.00..1989.80 rows=4802 width=56)
 Settings:  default_segment_num=160
PLAN                                                                            
                                                                                
                             
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.00..4259.89 rows=20 width=100)
   ->  Gather Motion 5:1  (slice10; segments: 5)  (cost=0.00..4259.88 rows=100 
width=100)
         Merge Key: customer_demographics.cd_gender, 
customer_demographics.cd_marital_status, 
customer_demographics.cd_education_status, 
customer_demographics.cd_purchase_estimate, 
customer_demographics.cd_credit_rating, customer_demographics.cd_dep_count, 
customer_demographics.cd_dep_employed_count, 
customer_demographics.cd_dep_college_count
         ->  Limit  (cost=0.00..4259.85 rows=20 width=100)
               ->  GroupAggregate  (cost=0.00..4259.85 rows=88 width=100)
                     Group By: customer_demographics.cd_gender, 
customer_demographics.cd_marital_status, 
customer_demographics.cd_education_status, 
customer_demographics.cd_purchase_estimate, 
customer_demographics.cd_credit_rating, customer_demographics.cd_dep_count, 
customer_demographics.cd_dep_employed_count, 
customer_demographics.cd_dep_college_count
                     ->  Sort  (cost=0.00..4259.84 rows=88 width=100)
                           Sort Key: customer_demographics.cd_gender, 
customer_demographics.cd_marital_status, 
customer_demographics.cd_education_status, 
customer_demographics.cd_purchase_estimate, 
customer_demographics.cd_credit_rating, customer_demographics.cd_dep_count, 
customer_demographics.cd_dep_employed_count, 
customer_demographics.cd_dep_college_count
                           ->  Redistribute Motion 5:5  (slice9; segments: 5)  
(cost=0.00..4259.52 rows=88 width=100)
                                 Hash Key: customer_demographics.cd_gender, 
customer_demographics.cd_marital_status, 
customer_demographics.cd_education_status, 
customer_demographics.cd_purchase_estimate, 
customer_demographics.cd_credit_rating, customer_demographics.cd_dep_count, 
customer_demographics.cd_dep_employed_count, 
customer_demographics.cd_dep_college_count
                                 ->  Result  (cost=0.00..4259.49 rows=88 
width=100)
                                       ->  HashAggregate  (cost=0.00..4259.49 
rows=88 width=100)
                                             Group By: 
customer_demographics.cd_gender, customer_demographics.cd_marital_status, 
customer_demographics.cd_education_status, 
customer_demographics.cd_purchase_estimate, 
customer_demographics.cd_credit_rating, customer_demographics.cd_dep_count, 
customer_demographics.cd_dep_employed_count, 
customer_demographics.cd_dep_college_count
                                             ->  Hash Join  (cost=0.00..4250.43 
rows=8406 width=52)
                                                   Hash Cond: 
customer_demographics.cd_demo_sk = customer.c_current_cdemo_sk
                                                   ->  Table Scan on 
customer_demographics  (cost=0.00..432.48 rows=38416 width=56)
                                                   ->  Hash  
(cost=3802.83..3802.83 rows=8406 width=4)
                                                         ->  Redistribute 
Motion 5:5  (slice8; segments: 5)  (cost=0.00..3802.83 rows=8406 width=4)
                                                               Hash Key: 
customer.c_current_cdemo_sk
                                                               ->  Hash Join  
(cost=0.00..3802.72 rows=8406 width=4)
                                                                     Hash Cond: 
customer.c_current_addr_sk = customer_address.ca_address_sk
                                                                     ->  
Redistribute Motion 5:5  (slice7; segments: 5)  (cost=0.00..3366.77 rows=15264 
width=8)
                                                                           Hash 
Key: customer.c_current_addr_sk
                                                                           ->  
Result  (cost=0.00..3366.38 rows=15264 width=8)
                                                                                
 Filter: COALESCE("ColRef_0259", 0::bigint) > 0::bigint OR 
COALESCE("ColRef_0260", 0::bigint) > 0::bigint
                                                                                
 ->  Result  (cost=0.00..3365.28 rows=16719 width=24)
                                                                                
       ->  Hash Left Join  (cost=0.00..3364.88 rows=16719 width=24)
                                                                                
             Hash Cond: customer.c_customer_sk = 
catalog_sales.cs_ship_customer_sk
                                                                                
             ->  Hash Left Join  (cost=0.00..2396.90 rows=11789 width=20)
                                                                                
                   Hash Cond: customer.c_customer_sk = 
web_sales.ws_bill_customer_sk
                                                                                
                   ->  Result  (cost=0.00..1479.35 rows=8844 width=12)
                                                                                
                         Filter: COALESCE("ColRef_0258", 0::bigint) > 0::bigint
                                                                                
                         ->  Result  (cost=0.00..1478.62 rows=22108 width=20)
                                                                                
                               ->  Hash Left Join  (cost=0.00..1478.18 
rows=22108 width=20)
                                                                                
                                     Hash Cond: customer.c_customer_sk = 
store_sales.ss_customer_sk
                                                                                
                                     ->  Table Scan on customer  
(cost=0.00..433.23 rows=20000 width=12)
                                                                                
                                     ->  Hash  (cost=1034.28..1034.28 
rows=13568 width=12)
                                                                                
                                           ->  HashAggregate  
(cost=0.00..1034.28 rows=13568 width=12)
                                                                                
                                                 Group By: 
store_sales.ss_customer_sk
                                                                                
                                                 ->  Redistribute Motion 5:5  
(slice2; segments: 5)  (cost=0.00..1032.54 rows=13568 width=12)
                                                                                
                                                       Hash Key: 
store_sales.ss_customer_sk
                                                                                
                                                       ->  Result  
(cost=0.00..1032.03 rows=13568 width=12)
                                                                                
                                                             ->  HashAggregate  
(cost=0.00..1032.03 rows=13568 width=12)
                                                                                
                                                                   Group By: 
store_sales.ss_customer_sk
                                                                                
                                                                   ->  Hash 
Join  (cost=0.00..1023.71 rows=66353 width=4)
                                                                                
                                                                         Hash 
Cond: store_sales.ss_sold_date_sk = public.date_dim.d_date_sk
                                                                                
                                                                         ->  
Table Scan on store_sales  (cost=0.00..478.21 rows=576081 width=8)
                                                                                
                                                                         ->  
Hash  (cost=433.03..433.03 rows=211 width=4)
                                                                                
                                                                               
->  Broadcast Motion 5:5  (slice1; segments: 5)  (cost=0.00..433.03 rows=211 
width=4)
                                                                                
                                                                                
     ->  Table Scan on date_dim  (cost=0.00..433.02 rows=43 width=4)
                                                                                
                                                                                
           Filter: d_year = 2000 AND d_moy >= 2 AND d_moy <= 5
                                                                                
                   ->  Hash  (cost=912.18..912.18 rows=7913 width=12)
                                                                                
                         ->  HashAggregate  (cost=0.00..912.18 rows=7913 
width=12)
                                                                                
                               Group By: web_sales.ws_bill_customer_sk
                                                                                
                               ->  Redistribute Motion 5:5  (slice4; segments: 
5)  (cost=0.00..911.16 rows=7913 width=12)
                                                                                
                                     Hash Key: web_sales.ws_bill_customer_sk
                                                                                
                                     ->  Result  (cost=0.00..910.86 rows=7913 
width=12)
                                                                                
                                           ->  HashAggregate  
(cost=0.00..910.86 rows=7913 width=12)
                                                                                
                                                 Group By: 
web_sales.ws_bill_customer_sk
                                                                                
                                                 ->  Hash Join  
(cost=0.00..908.77 rows=16581 width=4)
                                                                                
                                                       Hash Cond: 
web_sales.ws_sold_date_sk = public.date_dim.d_date_sk
                                                                                
                                                       ->  Table Scan on 
web_sales  (cost=0.00..447.62 rows=143877 width=8)
                                                                                
                                                       ->  Hash  
(cost=433.03..433.03 rows=211 width=4)
                                                                                
                                                             ->  Broadcast 
Motion 5:5  (slice3; segments: 5)  (cost=0.00..433.03 rows=211 width=4)
                                                                                
                                                                   ->  Table 
Scan on date_dim  (cost=0.00..433.02 rows=43 width=4)
                                                                                
                                                                         
Filter: d_year = 2000 AND d_moy >= 2 AND d_moy <= 5
                                                                                
             ->  Hash  (cost=959.83..959.83 rows=12230 width=12)
                                                                                
                   ->  HashAggregate  (cost=0.00..959.83 rows=12230 width=12)
                                                                                
                         Group By: catalog_sales.cs_ship_customer_sk
                                                                                
                         ->  Redistribute Motion 5:5  (slice6; segments: 5)  
(cost=0.00..958.26 rows=12230 width=12)
                                                                                
                               Hash Key: catalog_sales.cs_ship_customer_sk
                                                                                
                               ->  Result  (cost=0.00..957.80 rows=12230 
width=12)
                                                                                
                                     ->  HashAggregate  (cost=0.00..957.80 
rows=12230 width=12)
                                                                                
                                           Group By: 
catalog_sales.cs_ship_customer_sk
                                                                                
                                           ->  Hash Join  (cost=0.00..953.64 
rows=33081 width=4)
                                                                                
                                                 Hash Cond: 
catalog_sales.cs_sold_date_sk = public.date_dim.d_date_sk
                                                                                
                                                 ->  Table Scan on 
catalog_sales  (cost=0.00..464.30 rows=288310 width=8)
                                                                                
                                                 ->  Hash  (cost=433.03..433.03 
rows=211 width=4)
                                                                                
                                                       ->  Broadcast Motion 5:5 
 (slice5; segments: 5)  (cost=0.00..433.03 rows=211 width=4)
                                                                                
                                                             ->  Table Scan on 
date_dim  (cost=0.00..433.02 rows=43 width=4)
                                                                                
                                                                   Filter: 
d_year = 2000 AND d_moy >= 2 AND d_moy <= 5
                                                                     ->  Hash  
(cost=432.35..432.35 rows=4000 width=4)
                                                                           ->  
Table Scan on customer_address  (cost=0.00..432.35 rows=4000 width=4)
                                                                                
 Filter: ca_county::text = ANY ('{"Yellowstone County","Montgomery 
County","Divide County","Cedar County","Manassas Park city"}'::character 
varying[]::text[])

Attachment: query10.sql
Description: application/sql

Reply via email to