[ https://issues.apache.org/jira/browse/IMPALA-4040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Tim Armstrong reassigned IMPALA-4040: ------------------------------------- Assignee: (was: Alexander Behm) > Performance regression introduced by "IMPALA-3828 Join inversion" > ------------------------------------------------------------------ > > Key: IMPALA-4040 > URL: https://issues.apache.org/jira/browse/IMPALA-4040 > Project: IMPALA > Issue Type: Bug > Components: Frontend > Affects Versions: Impala 2.7.0 > Reporter: Mostafa Mokhtar > Priority: Major > Labels: regression > > IMPALA-3828 improved several TPC-DS queries but introduced a regression to > queries 10 and 17. > The regression in TPC-DS Q10 is because the runtime filters in the old plan > had a cascading effect mimicking a bushy execution plan. > Unfortunately filtering from runtime filters is not accounted for by the > planner and one of the scan nodes is moved to a location where runtime > filters no longer have this cascading effect. > For TPC-DS Q10 before IMPALA-3828 RF001 created from customer_address was > pushed on to customer creating RF002 which is very selective and eventually > gets pushed to web_sales, catalog_sales and store_sales. > Post IMPALA-3828 RF001 gets pushed on the customer table but in the new plan > the customer scan node is the left most node in the subplan and as a result > no runtime filters from customer gets pushed onto the *_sales fact tables > which dominate the cost of the plan. > TPC-DS Q10 > {code} > select > cd_gender, > cd_marital_status, > cd_education_status, > count(*) cnt1, > cd_purchase_estimate, > count(*) cnt2, > cd_credit_rating, > count(*) cnt3, > cd_dep_count, > count(*) cnt4, > cd_dep_employed_count, > count(*) cnt5, > cd_dep_college_count, > count(*) cnt6 > from > customer c,customer_address ca,customer_demographics, > (select ss_customer_sk > from store_sales,date_dim > where ss_sold_date_sk = d_date_sk and > d_year = 2002 and > d_moy between 2 and 2+3) ss, > (select ws_bill_customer_sk > from web_sales,date_dim > where ws_sold_date_sk = d_date_sk and > d_year = 2002 and > d_moy between 2 ANd 2+3) ws, > (select cs_ship_customer_sk > from catalog_sales,date_dim > where cs_sold_date_sk = d_date_sk and > d_year = 2002 and > d_moy between 2 and 2+3) cs > where > c.c_current_addr_sk = ca.ca_address_sk and > ca_county in ('McKenzie County','Adams County','Grant County','Saguache > County','Waseca County') and > cd_demo_sk = c.c_current_cdemo_sk > and c_customer_sk = ss_customer_sk > and c_customer_sk = ws_bill_customer_sk > and c_customer_sk = cs_ship_customer_sk > > > group by cd_gender, > cd_marital_status, > cd_education_status, > cd_purchase_estimate, > cd_credit_rating, > cd_dep_count, > cd_dep_employed_count, > cd_dep_college_count > order by cd_gender, > cd_marital_status, > cd_education_status, > cd_purchase_estimate, > cd_credit_rating, > cd_dep_count, > cd_dep_employed_count, > cd_dep_college_count > limit 100 > {code} > Plan before change > {code} > 31:MERGING-EXCHANGE [UNPARTITIONED] > | order by: cd_gender ASC, cd_marital_status ASC, cd_education_status ASC, > cd_purchase_estimate ASC, cd_credit_rating ASC, cd_dep_count ASC, > cd_dep_employed_count ASC, cd_dep_college_count ASC > | limit: 100 > | hosts=15 per-host-mem=unavailable > | tuple-ids=13 row-size=107B cardinality=100 > | > 18:TOP-N [LIMIT=100] > | order by: cd_gender ASC, cd_marital_status ASC, cd_education_status ASC, > cd_purchase_estimate ASC, cd_credit_rating ASC, cd_dep_count ASC, > cd_dep_employed_count ASC, cd_dep_college_count ASC > | hosts=15 per-host-mem=10.41KB > | tuple-ids=13 row-size=107B cardinality=100 > | > 30:AGGREGATE [FINALIZE] > | output: count:merge(*) > | group by: cd_gender, cd_marital_status, cd_education_status, > cd_purchase_estimate, cd_credit_rating, cd_dep_count, cd_dep_employed_count, > cd_dep_college_count > | hosts=15 per-host-mem=10.00MB > | tuple-ids=12 row-size=107B cardinality=2406 > | > 29:EXCHANGE > [HASH(cd_gender,cd_marital_status,cd_education_status,cd_purchase_estimate,cd_credit_rating,cd_dep_count,cd_dep_employed_count,cd_dep_college_count)] > | hosts=15 per-host-mem=0B > | tuple-ids=12 row-size=107B cardinality=2406 > | > 17:AGGREGATE [STREAMING] > | output: count(*) > | group by: cd_gender, cd_marital_status, cd_education_status, > cd_purchase_estimate, cd_credit_rating, cd_dep_count, cd_dep_employed_count, > cd_dep_college_count > | hosts=15 per-host-mem=10.00MB > | tuple-ids=12 row-size=107B cardinality=2406 > | > 16:HASH JOIN [INNER JOIN, PARTITIONED] > | hash predicates: c.c_current_cdemo_sk = cd_demo_sk > | runtime filters: RF000 <- cd_demo_sk > | hosts=15 per-host-mem=13.78MB > | tuple-ids=3,4,9,10,6,7,0,1,2 row-size=208B cardinality=2406 > | > |--28:EXCHANGE [HASH(cd_demo_sk)] > | | hosts=15 per-host-mem=0B > | | tuple-ids=2 row-size=103B cardinality=1920800 > | | > | 02:SCAN HDFS [tpcds_1000_parquet.customer_demographics, RANDOM] > | partitions=1/1 files=15 size=8.61MB > | table stats: 1920800 rows total > | column stats: all > | hosts=15 per-host-mem=144.00MB > | tuple-ids=2 row-size=103B cardinality=1920800 > | > 27:EXCHANGE [HASH(c.c_current_cdemo_sk)] > | hosts=15 per-host-mem=0B > | tuple-ids=3,4,9,10,6,7,0,1 row-size=106B cardinality=2430 > | > 15:HASH JOIN [INNER JOIN, BROADCAST] > | hash predicates: c.c_current_addr_sk = ca.ca_address_sk > | runtime filters: RF001 <- ca.ca_address_sk > | hosts=15 per-host-mem=592.29KB > | tuple-ids=3,4,9,10,6,7,0,1 row-size=106B cardinality=2430 > | > |--26:EXCHANGE [BROADCAST] > | | hosts=15 per-host-mem=0B > | | tuple-ids=1 row-size=34B cardinality=16438 > | | > | 01:SCAN HDFS [tpcds_1000_parquet.customer_address ca, RANDOM] > | partitions=1/1 files=15 size=113.31MB > | predicates: ca_county IN ('McKenzie County', 'Adams County', 'Grant > County', 'Saguache County', 'Waseca County') > | table stats: 6000000 rows total > | column stats: all > | hosts=15 per-host-mem=32.00MB > | tuple-ids=1 row-size=34B cardinality=16438 > | > 14:HASH JOIN [INNER JOIN, PARTITIONED] > | hash predicates: ss_customer_sk = c_customer_sk > | runtime filters: RF002 <- c_customer_sk > | hosts=15 per-host-mem=10.07MB > | tuple-ids=3,4,9,10,6,7,0 row-size=72B cardinality=744355 > | > |--25:EXCHANGE [HASH(c_customer_sk)] > | | hosts=15 per-host-mem=0B > | | tuple-ids=0 row-size=12B cardinality=12000000 > | | > | 00:SCAN HDFS [tpcds_1000_parquet.customer c, RANDOM] > | partitions=1/1 files=15 size=619.69MB > | runtime filters: RF000 -> c.c_current_cdemo_sk, RF001 -> > c.c_current_addr_sk > | table stats: 12000000 rows total > | column stats: all > | hosts=15 per-host-mem=168.00MB > | tuple-ids=0 row-size=12B cardinality=12000000 > | > 13:HASH JOIN [INNER JOIN, PARTITIONED] > | hash predicates: ss_customer_sk = ws_bill_customer_sk > | runtime filters: RF003 <- ws_bill_customer_sk > | hosts=15 per-host-mem=65.15MB > | tuple-ids=3,4,9,10,6,7 row-size=60B cardinality=773855 > | > |--24:EXCHANGE [HASH(ws_bill_customer_sk)] > | | hosts=15 per-host-mem=0B > | | tuple-ids=6,7 row-size=20B cardinality=46578972 > | | > | 08:HASH JOIN [INNER JOIN, BROADCAST] > | | hash predicates: ws_sold_date_sk = d_date_sk > | | runtime filters: RF007 <- d_date_sk > | | hosts=15 per-host-mem=1.52KB > | | tuple-ids=6,7 row-size=20B cardinality=46578972 > | | > | |--23:EXCHANGE [BROADCAST] > | | | hosts=1 per-host-mem=0B > | | | tuple-ids=7 row-size=12B cardinality=118 > | | | > | | 07:SCAN HDFS [tpcds_1000_parquet.date_dim, RANDOM] > | | partitions=1/1 files=1 size=2.17MB > | | predicates: d_year = 2002, d_moy >= 2, d_moy <= 2 + 3 > | | table stats: 73049 rows total > | | column stats: all > | | hosts=1 per-host-mem=48.00MB > | | tuple-ids=7 row-size=12B cardinality=118 > | | > | 06:SCAN HDFS [tpcds_1000_parquet.web_sales, RANDOM] > | partitions=1824/1824 files=1824 size=47.08GB > | runtime filters: RF002 -> > tpcds_1000_parquet.web_sales.ws_bill_customer_sk, RF007 -> ws_sold_date_sk > | table stats: 720000376 rows total > | column stats: all > | hosts=15 per-host-mem=40.00MB > | tuple-ids=6 row-size=8B cardinality=720000376 > | > 12:HASH JOIN [INNER JOIN, PARTITIONED] > | hash predicates: ss_customer_sk = cs_ship_customer_sk > | runtime filters: RF004 <- cs_ship_customer_sk > | hosts=15 per-host-mem=129.38MB > | tuple-ids=3,4,9,10 row-size=40B cardinality=11967976 > | > |--22:EXCHANGE [HASH(cs_ship_customer_sk)] > | | hosts=15 per-host-mem=0B > | | tuple-ids=9,10 row-size=20B cardinality=92497382 > | | > | 11:HASH JOIN [INNER JOIN, BROADCAST] > | | hash predicates: cs_sold_date_sk = d_date_sk > | | runtime filters: RF006 <- d_date_sk > | | hosts=15 per-host-mem=1.52KB > | | tuple-ids=9,10 row-size=20B cardinality=92497382 > | | > | |--20:EXCHANGE [BROADCAST] > | | | hosts=1 per-host-mem=0B > | | | tuple-ids=10 row-size=12B cardinality=118 > | | | > | | 10:SCAN HDFS [tpcds_1000_parquet.date_dim, RANDOM] > | | partitions=1/1 files=1 size=2.17MB > | | predicates: d_year = 2002, d_moy >= 2, d_moy <= 2 + 3 > | | table stats: 73049 rows total > | | column stats: all > | | hosts=1 per-host-mem=48.00MB > | | tuple-ids=10 row-size=12B cardinality=118 > | | > | 09:SCAN HDFS [tpcds_1000_parquet.catalog_sales, RANDOM] > | partitions=1837/1837 files=1837 size=118.74GB > | runtime filters: RF002 -> > tpcds_1000_parquet.catalog_sales.cs_ship_customer_sk, RF003 -> > tpcds_1000_parquet.catalog_sales.cs_ship_customer_sk, RF006 -> cs_sold_date_sk > | table stats: 1439980416 rows total > | column stats: all > | hosts=15 per-host-mem=72.00MB > | tuple-ids=9 row-size=8B cardinality=1439980416 > | > 21:EXCHANGE [HASH(ss_customer_sk)] > | hosts=15 per-host-mem=0B > | tuple-ids=3,4 row-size=20B cardinality=186315014 > | > 05:HASH JOIN [INNER JOIN, BROADCAST] > | hash predicates: ss_sold_date_sk = d_date_sk > | runtime filters: RF005 <- d_date_sk > | hosts=15 per-host-mem=1.52KB > | tuple-ids=3,4 row-size=20B cardinality=186315014 > | > |--19:EXCHANGE [BROADCAST] > | | hosts=1 per-host-mem=0B > | | tuple-ids=4 row-size=12B cardinality=118 > | | > | 04:SCAN HDFS [tpcds_1000_parquet.date_dim, RANDOM] > | partitions=1/1 files=1 size=2.17MB > | predicates: d_year = 2002, d_moy >= 2, d_moy <= 2 + 3 > | table stats: 73049 rows total > | column stats: all > | hosts=1 per-host-mem=48.00MB > | tuple-ids=4 row-size=12B cardinality=118 > | > 03:SCAN HDFS [tpcds_1000_parquet.store_sales, RANDOM] > partitions=1824/1824 files=1824 size=189.24GB > runtime filters: RF002 -> ss_customer_sk, RF003 -> ss_customer_sk, RF004 > -> ss_customer_sk, RF005 -> ss_sold_date_sk > table stats: 2879987999 rows total > column stats: all > hosts=15 per-host-mem=88.00MB > tuple-ids=3 row-size=8B cardinality=2879987999 > {code} > Plan after change > {code} > 30:MERGING-EXCHANGE [UNPARTITIONED] > | order by: cd_gender ASC, cd_marital_status ASC, cd_education_status ASC, > cd_purchase_estimate ASC, cd_credit_rating ASC, cd_dep_count ASC, > cd_dep_employed_count ASC, cd_dep_college_count ASC > | limit: 100 > | hosts=15 per-host-mem=unavailable > | tuple-ids=13 row-size=107B cardinality=100 > | > 18:TOP-N [LIMIT=100] > | order by: cd_gender ASC, cd_marital_status ASC, cd_education_status ASC, > cd_purchase_estimate ASC, cd_credit_rating ASC, cd_dep_count ASC, > cd_dep_employed_count ASC, cd_dep_college_count ASC > | hosts=15 per-host-mem=10.41KB > | tuple-ids=13 row-size=107B cardinality=100 > | > 29:AGGREGATE [FINALIZE] > | output: count:merge(*) > | group by: cd_gender, cd_marital_status, cd_education_status, > cd_purchase_estimate, cd_credit_rating, cd_dep_count, cd_dep_employed_count, > cd_dep_college_count > | hosts=15 per-host-mem=10.00MB > | tuple-ids=12 row-size=107B cardinality=2406 > | > 28:EXCHANGE > [HASH(cd_gender,cd_marital_status,cd_education_status,cd_purchase_estimate,cd_credit_rating,cd_dep_count,cd_dep_employed_count,cd_dep_college_count)] > | hosts=15 per-host-mem=0B > | tuple-ids=12 row-size=107B cardinality=2406 > | > 17:AGGREGATE [STREAMING] > | output: count(*) > | group by: cd_gender, cd_marital_status, cd_education_status, > cd_purchase_estimate, cd_credit_rating, cd_dep_count, cd_dep_employed_count, > cd_dep_college_count > | hosts=15 per-host-mem=10.00MB > | tuple-ids=12 row-size=107B cardinality=2406 > | > 16:HASH JOIN [INNER JOIN, BROADCAST] > | hash predicates: cd_demo_sk = c.c_current_cdemo_sk > | runtime filters: RF000 <- c.c_current_cdemo_sk > | hosts=15 per-host-mem=275.50KB > | tuple-ids=2,0,6,7,3,4,9,10,1 row-size=208B cardinality=2406 > | > |--27:EXCHANGE [BROADCAST] > | | hosts=15 per-host-mem=0B > | | tuple-ids=0,6,7,3,4,9,10,1 row-size=106B cardinality=2430 > | | > | 15:HASH JOIN [INNER JOIN, BROADCAST] > | | hash predicates: c.c_current_addr_sk = ca.ca_address_sk > | | runtime filters: RF001 <- ca.ca_address_sk > | | hosts=15 per-host-mem=592.29KB > | | tuple-ids=0,6,7,3,4,9,10,1 row-size=106B cardinality=2430 > | | > | |--26:EXCHANGE [BROADCAST] > | | | hosts=15 per-host-mem=0B > | | | tuple-ids=1 row-size=34B cardinality=16438 > | | | > | | 01:SCAN HDFS [tpcds_1000_parquet.customer_address ca, RANDOM] > | | partitions=1/1 files=15 size=113.31MB > | | predicates: ca_county IN ('McKenzie County', 'Adams County', 'Grant > County', 'Saguache County', 'Waseca County') > | | table stats: 6000000 rows total > | | column stats: all > | | hosts=15 per-host-mem=32.00MB > | | tuple-ids=1 row-size=34B cardinality=16438 > | | > | 14:HASH JOIN [INNER JOIN, PARTITIONED] > | | hash predicates: c_customer_sk = ss_customer_sk > | | runtime filters: RF002 <- ss_customer_sk > | | hosts=15 per-host-mem=3.25MB > | | tuple-ids=0,6,7,3,4,9,10 row-size=72B cardinality=744355 > | | > | |--13:HASH JOIN [INNER JOIN, PARTITIONED] > | | | hash predicates: ws_bill_customer_sk = ss_customer_sk > | | | runtime filters: RF003 <- ss_customer_sk > | | | hosts=15 per-host-mem=33.48MB > | | | tuple-ids=6,7,3,4,9,10 row-size=60B cardinality=773855 > | | | > | | |--12:HASH JOIN [INNER JOIN, PARTITIONED] > | | | | hash predicates: ss_customer_sk = cs_ship_customer_sk > | | | | runtime filters: RF005 <- cs_ship_customer_sk > | | | | hosts=15 per-host-mem=129.38MB > | | | | tuple-ids=3,4,9,10 row-size=40B cardinality=11967976 > | | | | > | | | |--23:EXCHANGE [HASH(cs_ship_customer_sk)] > | | | | | hosts=15 per-host-mem=0B > | | | | | tuple-ids=9,10 row-size=20B cardinality=92497382 > | | | | | > | | | | 11:HASH JOIN [INNER JOIN, BROADCAST] > | | | | | hash predicates: cs_sold_date_sk = d_date_sk > | | | | | runtime filters: RF007 <- d_date_sk > | | | | | hosts=15 per-host-mem=1.52KB > | | | | | tuple-ids=9,10 row-size=20B cardinality=92497382 > | | | | | > | | | | |--21:EXCHANGE [BROADCAST] > | | | | | | hosts=1 per-host-mem=0B > | | | | | | tuple-ids=10 row-size=12B cardinality=118 > | | | | | | > | | | | | 10:SCAN HDFS [tpcds_1000_parquet.date_dim, RANDOM] > | | | | | partitions=1/1 files=1 size=2.17MB > | | | | | predicates: d_year = 2002, d_moy >= 2, d_moy <= 2 + 3 > | | | | | table stats: 73049 rows total > | | | | | column stats: all > | | | | | hosts=1 per-host-mem=48.00MB > | | | | | tuple-ids=10 row-size=12B cardinality=118 > | | | | | > | | | | 09:SCAN HDFS [tpcds_1000_parquet.catalog_sales, RANDOM] > | | | | partitions=1837/1837 files=1837 size=118.74GB > | | | | runtime filters: RF007 -> cs_sold_date_sk > | | | | table stats: 1439980416 rows total > | | | | column stats: all > | | | | hosts=15 per-host-mem=72.00MB > | | | | tuple-ids=9 row-size=8B cardinality=1439980416 > | | | | > | | | 22:EXCHANGE [HASH(ss_customer_sk)] > | | | | hosts=15 per-host-mem=0B > | | | | tuple-ids=3,4 row-size=20B cardinality=186315014 > | | | | > | | | 05:HASH JOIN [INNER JOIN, BROADCAST] > | | | | hash predicates: ss_sold_date_sk = d_date_sk > | | | | runtime filters: RF006 <- d_date_sk > | | | | hosts=15 per-host-mem=1.52KB > | | | | tuple-ids=3,4 row-size=20B cardinality=186315014 > | | | | > | | | |--20:EXCHANGE [BROADCAST] > | | | | | hosts=1 per-host-mem=0B > | | | | | tuple-ids=4 row-size=12B cardinality=118 > | | | | | > | | | | 04:SCAN HDFS [tpcds_1000_parquet.date_dim, RANDOM] > | | | | partitions=1/1 files=1 size=2.17MB > | | | | predicates: d_year = 2002, d_moy >= 2, d_moy <= 2 + 3 > | | | | table stats: 73049 rows total > | | | | column stats: all > | | | | hosts=1 per-host-mem=48.00MB > | | | | tuple-ids=4 row-size=12B cardinality=118 > | | | | > | | | 03:SCAN HDFS [tpcds_1000_parquet.store_sales, RANDOM] > | | | partitions=1824/1824 files=1824 size=189.24GB > | | | runtime filters: RF005 -> ss_customer_sk, RF006 -> ss_sold_date_sk > | | | table stats: 2879987999 rows total > | | | column stats: all > | | | hosts=15 per-host-mem=88.00MB > | | | tuple-ids=3 row-size=8B cardinality=2879987999 > | | | > | | 24:EXCHANGE [HASH(ws_bill_customer_sk)] > | | | hosts=15 per-host-mem=0B > | | | tuple-ids=6,7 row-size=20B cardinality=46578972 > | | | > | | 08:HASH JOIN [INNER JOIN, BROADCAST] > | | | hash predicates: ws_sold_date_sk = d_date_sk > | | | runtime filters: RF004 <- d_date_sk > | | | hosts=15 per-host-mem=1.52KB > | | | tuple-ids=6,7 row-size=20B cardinality=46578972 > | | | > | | |--19:EXCHANGE [BROADCAST] > | | | | hosts=1 per-host-mem=0B > | | | | tuple-ids=7 row-size=12B cardinality=118 > | | | | > | | | 07:SCAN HDFS [tpcds_1000_parquet.date_dim, RANDOM] > | | | partitions=1/1 files=1 size=2.17MB > | | | predicates: d_year = 2002, d_moy >= 2, d_moy <= 2 + 3 > | | | table stats: 73049 rows total > | | | column stats: all > | | | hosts=1 per-host-mem=48.00MB > | | | tuple-ids=7 row-size=12B cardinality=118 > | | | > | | 06:SCAN HDFS [tpcds_1000_parquet.web_sales, RANDOM] > | | partitions=1824/1824 files=1824 size=47.08GB > | | runtime filters: RF003 -> ws_bill_customer_sk, RF004 -> > ws_sold_date_sk > | | table stats: 720000376 rows total > | | column stats: all > | | hosts=15 per-host-mem=40.00MB > | | tuple-ids=6 row-size=8B cardinality=720000376 > | | > | 25:EXCHANGE [HASH(c_customer_sk)] > | | hosts=15 per-host-mem=0B > | | tuple-ids=0 row-size=12B cardinality=12000000 > | | > | 00:SCAN HDFS [tpcds_1000_parquet.customer c, RANDOM] > | partitions=1/1 files=15 size=619.69MB > | runtime filters: RF001 -> c.c_current_addr_sk, RF002 -> c_customer_sk > | table stats: 12000000 rows total > | column stats: all > | hosts=15 per-host-mem=168.00MB > | tuple-ids=0 row-size=12B cardinality=12000000 > | > 02:SCAN HDFS [tpcds_1000_parquet.customer_demographics, RANDOM] > partitions=1/1 files=15 size=8.61MB > runtime filters: RF000 -> cd_demo_sk > table stats: 1920800 rows total > column stats: all > hosts=15 per-host-mem=144.00MB > tuple-ids=2 row-size=103B cardinality=1920800 > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org