[ https://issues.apache.org/jira/browse/IMPALA-3573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Tim Armstrong updated IMPALA-3573: ---------------------------------- Issue Type: Improvement (was: Bug) > Planner doesn't take into account runtime filter selectivity > ------------------------------------------------------------ > > Key: IMPALA-3573 > URL: https://issues.apache.org/jira/browse/IMPALA-3573 > Project: IMPALA > Issue Type: Improvement > Components: Frontend > Affects Versions: Impala 2.5.0 > Reporter: Mostafa Mokhtar > Priority: Major > Labels: bushy, planner > > Applying selective runtime filters can drastically change the cardinality of > scan nodes, the planner doesn't cost the runtime filters as filters as a > result it misses out on a more selective plan. > In this particular query there is three fact to dimension joins > * (ss x d1) -> 389.28M rows > * (sr x d2) -> 234.43M rows > * (cs x d3) -> 12.85B rows > The planner doesn't re-evaluate the cardinality estimation of ss, sr and cs > after the runtime filter are applied and puts ss as the left most node in the > plan where it should have been cs. > Ideally this should be a bushy plan.... > Query > {code} > select i_item_id > ,i_item_desc > ,s_store_id > ,s_store_name > ,sum(ss_quantity) as store_sales_quantity > ,sum(sr_return_quantity) as store_returns_quantity > ,sum(cs_quantity) as catalog_sales_quantity > from > store_sales > ,store_returns > ,catalog_sales > ,date_dim d1 > ,date_dim d2 > ,date_dim d3 > ,store > ,item > where > d1.d_moy = 4 > and d1.d_year = 1999 > and d1.d_date_sk = ss_sold_date_sk > and i_item_sk = ss_item_sk > and s_store_sk = ss_store_sk > and ss_customer_sk = sr_customer_sk > and ss_item_sk = sr_item_sk > and ss_ticket_number = sr_ticket_number > and sr_returned_date_sk = d2.d_date_sk > and d2.d_moy between 4 and 4 + 3 > and d2.d_year = 1999 > and sr_customer_sk = cs_bill_customer_sk > and sr_item_sk = cs_item_sk > and cs_sold_date_sk = d3.d_date_sk > and d3.d_year in (1999,1999+1,1999+2) > group by > i_item_id > ,i_item_desc > ,s_store_id > ,s_store_name > order by > i_item_id > ,i_item_desc > ,s_store_id > ,s_store_name > limit 100 > {code} > Plan > {code} > 28:MERGING-EXCHANGE [UNPARTITIONED] > | order by: i_item_id ASC, i_item_desc ASC, s_store_id ASC, s_store_name ASC > | limit: 100 > | hosts=20 per-host-mem=unavailable > | tuple-ids=9 row-size=224B cardinality=100 > | > 16:TOP-N [LIMIT=100] > | order by: i_item_id ASC, i_item_desc ASC, s_store_id ASC, s_store_name ASC > | hosts=20 per-host-mem=21.89KB > | tuple-ids=9 row-size=224B cardinality=100 > | > 27:AGGREGATE [FINALIZE] > | output: sum:merge(ss_quantity), sum:merge(sr_return_quantity), > sum:merge(cs_quantity) > | group by: i_item_id, i_item_desc, s_store_id, s_store_name > | hosts=20 per-host-mem=212.22MB > | tuple-ids=8 row-size=224B cardinality=335852270 > | > 26:EXCHANGE [HASH(i_item_id,i_item_desc,s_store_id,s_store_name)] > | hosts=20 per-host-mem=0B > | tuple-ids=8 row-size=224B cardinality=335852270 > | > 15:AGGREGATE [STREAMING] > | output: sum(ss_quantity), sum(sr_return_quantity), sum(cs_quantity) > | group by: i_item_id, i_item_desc, s_store_id, s_store_name > | hosts=20 per-host-mem=77.13GB > | tuple-ids=8 row-size=224B cardinality=335852270 > | > 14:HASH JOIN [INNER JOIN, BROADCAST] > | hash predicates: ss_item_sk = i_item_sk > | runtime filters: RF000 <- i_item_sk > | hosts=20 per-host-mem=5.24MB > | tuple-ids=0,3,1,4,2,5,6,7 row-size=368B cardinality=335852270 > | > |--25:EXCHANGE [BROADCAST] > | | hosts=1 per-host-mem=0B > | | tuple-ids=7 row-size=156B cardinality=32000 > | | > | 07:SCAN HDFS [tpcds_15000_decimal_parquet.item, RANDOM] > | partitions=1/1 files=1 size=3.14MB > | table stats: 32000 rows total > | column stats: all > | hosts=1 per-host-mem=48.00MB > | tuple-ids=7 row-size=156B cardinality=32000 > | > 13:HASH JOIN [INNER JOIN, BROADCAST] > | hash predicates: ss_store_sk = s_store_sk > | runtime filters: RF001 <- s_store_sk > | hosts=20 per-host-mem=4.00KB > | tuple-ids=0,3,1,4,2,5,6 row-size=212B cardinality=335852270 > | > |--24:EXCHANGE [BROADCAST] > | | hosts=1 per-host-mem=0B > | | tuple-ids=6 row-size=60B cardinality=62 > | | > | 06:SCAN HDFS [tpcds_15000_decimal_parquet.store, RANDOM] > | partitions=1/1 files=1 size=11.92KB > | table stats: 62 rows total > | column stats: all > | hosts=1 per-host-mem=48.00MB > | tuple-ids=6 row-size=60B cardinality=62 > | > 12:HASH JOIN [INNER JOIN, BROADCAST] > | hash predicates: cs_sold_date_sk = d3.d_date_sk > | runtime filters: RF002 <- d3.d_date_sk > | hosts=20 per-host-mem=25.23KB > | tuple-ids=0,3,1,4,2,5 row-size=152B cardinality=335852270 > | > |--23:EXCHANGE [BROADCAST] > | | hosts=1 per-host-mem=0B > | | tuple-ids=5 row-size=12B cardinality=1957 > | | > | 05:SCAN HDFS [tpcds_15000_decimal_parquet.date_dim d3, RANDOM] > | partitions=1/1 files=1 size=2.17MB > | predicates: d3.d_year IN (1999, 1999 + 1, 1999 + 2) > | table stats: 73049 rows total > | column stats: all > | hosts=1 per-host-mem=32.00MB > | tuple-ids=5 row-size=12B cardinality=1957 > | > 11:HASH JOIN [INNER JOIN, PARTITIONED] > | hash predicates: sr_customer_sk = cs_bill_customer_sk, sr_item_sk = > cs_item_sk > | runtime filters: RF003 <- cs_bill_customer_sk, RF004 <- cs_item_sk > | hosts=20 per-host-mem=30.98GB > | tuple-ids=0,3,1,4,2 row-size=140B cardinality=352496606 > | > |--22:EXCHANGE [HASH(cs_bill_customer_sk,cs_item_sk)] > | | hosts=20 per-host-mem=0B > | | tuple-ids=2 row-size=28B cardinality=21602442972 > | | > | 02:SCAN HDFS [tpcds_15000_decimal_parquet.catalog_sales, RANDOM] > | partitions=1837/1837 files=6122 size=1.33TB > | runtime filters: RF000 -> > tpcds_15000_decimal_parquet.catalog_sales.cs_item_sk, RF002 -> cs_sold_date_sk > | table stats: 21602442972 rows total > | column stats: all > | hosts=20 per-host-mem=264.00MB > | tuple-ids=2 row-size=28B cardinality=21602442972 > | > 21:EXCHANGE [HASH(sr_customer_sk,sr_item_sk)] > | hosts=20 per-host-mem=0B > | tuple-ids=0,3,1,4 row-size=112B cardinality=352496606 > | > 10:HASH JOIN [INNER JOIN, BROADCAST] > | hash predicates: sr_returned_date_sk = d2.d_date_sk > | runtime filters: RF005 <- d2.d_date_sk > | hosts=20 per-host-mem=3.54KB > | tuple-ids=0,3,1,4 row-size=112B cardinality=352496606 > | > |--20:EXCHANGE [BROADCAST] > | | hosts=1 per-host-mem=0B > | | tuple-ids=4 row-size=16B cardinality=206 > | | > | 04:SCAN HDFS [tpcds_15000_decimal_parquet.date_dim d2, RANDOM] > | partitions=1/1 files=1 size=2.17MB > | predicates: d2.d_year = 1999, d2.d_moy >= 4, d2.d_moy <= 4 + 3 > | table stats: 73049 rows total > | column stats: all > | hosts=1 per-host-mem=48.00MB > | tuple-ids=4 row-size=16B cardinality=206 > | > 09:HASH JOIN [INNER JOIN, PARTITIONED] > | hash predicates: ss_customer_sk = sr_customer_sk, ss_item_sk = sr_item_sk, > ss_ticket_number = sr_ticket_number > | runtime filters: RF006 <- sr_customer_sk, RF007 <- sr_item_sk, RF008 <- > sr_ticket_number > | hosts=20 per-host-mem=7.96GB > | tuple-ids=0,3,1 row-size=96B cardinality=3834191129 > | > |--19:EXCHANGE [HASH(sr_customer_sk,sr_item_sk,sr_ticket_number)] > | | hosts=20 per-host-mem=0B > | | tuple-ids=1 row-size=36B cardinality=4317083368 > | | > | 01:SCAN HDFS [tpcds_15000_decimal_parquet.store_returns, RANDOM] > | partitions=2004/2004 files=2022 size=201.98GB > | runtime filters: RF000 -> > tpcds_15000_decimal_parquet.store_returns.sr_item_sk, RF003 -> > sr_customer_sk, RF004 -> sr_item_sk, RF005 -> sr_returned_date_sk > | table stats: 4317083368 rows total > | column stats: all > | hosts=20 per-host-mem=352.00MB > | tuple-ids=1 row-size=36B cardinality=4317083368 > | > 18:EXCHANGE [HASH(ss_customer_sk,ss_item_sk,ss_ticket_number)] > | hosts=20 per-host-mem=0B > | tuple-ids=0,3 row-size=60B cardinality=3834191129 > | > 08:HASH JOIN [INNER JOIN, BROADCAST] > | hash predicates: ss_sold_date_sk = d1.d_date_sk > | runtime filters: RF009 <- d1.d_date_sk > | hosts=20 per-host-mem=3.11KB > | tuple-ids=0,3 row-size=60B cardinality=3834191129 > | > |--17:EXCHANGE [BROADCAST] > | | hosts=1 per-host-mem=0B > | | tuple-ids=3 row-size=16B cardinality=181 > | | > | 03:SCAN HDFS [tpcds_15000_decimal_parquet.date_dim d1, RANDOM] > | partitions=1/1 files=1 size=2.17MB > | predicates: d1.d_year = 1999, d1.d_moy = 4 > | table stats: 73049 rows total > | column stats: all > | hosts=1 per-host-mem=48.00MB > | tuple-ids=3 row-size=16B cardinality=181 > | > 00:SCAN HDFS [tpcds_15000_decimal_parquet.store_sales, RANDOM] > partitions=1824/1824 files=6922 size=1.52TB > runtime filters: RF000 -> ss_item_sk, RF001 -> ss_store_sk, RF003 -> > tpcds_15000_decimal_parquet.store_sales.ss_customer_sk, RF004 -> > tpcds_15000_decimal_parquet.store_sales.ss_item_sk, RF006 -> ss_customer_sk, > RF007 -> ss_item_sk, RF008 -> ss_ticket_number, RF009 -> ss_sold_date_sk > table stats: 43202446268 rows total > column stats: all > hosts=20 per-host-mem=440.00MB > tuple-ids=0 row-size=44B cardinality=43202446268 > {code} > Summary > {code} > Operator #Hosts Avg Time Max Time #Rows Est. #Rows > Peak Mem Est. Peak Mem Detail > ---------------------------------------------------------------------------------------------------------------------------------- > 28:MERGING-EXCHANGE 1 1.109ms 1.109ms 1.02K 100 > 0 -1.00 B UNPARTITIONED > 16:TOP-N 20 3.652ms 5.942ms 2.00K 100 > 132.00 KB 21.89 KB > 27:AGGREGATE 20 275.802ms 458.876ms 442.90K 335.85M > 11.35 MB 212.22 MB FINALIZE > 26:EXCHANGE 20 59.647ms 258.173ms 1.13M 335.85M > 0 0 HASH(i_item_id,i_item_desc,... > 15:AGGREGATE 20 343.414ms 422.845ms 1.13M 335.85M > 143.24 MB 77.13 GB STREAMING > 14:HASH JOIN 20 30.930ms 41.410ms 2.20M 335.85M > 12.99 MB 5.24 MB INNER JOIN, BROADCAST > |--25:EXCHANGE 20 1.052ms 1.677ms 32.00K 32.00K > 0 0 BROADCAST > | 07:SCAN HDFS 1 18.325ms 18.325ms 32.00K 32.00K > 9.43 MB 48.00 MB tpcds_15000_decimal_parquet... > 13:HASH JOIN 20 18.778ms 25.275ms 2.20M 335.85M > 3.49 MB 4.00 KB INNER JOIN, BROADCAST > |--24:EXCHANGE 20 25.213us 301.428us 62 62 > 0 0 BROADCAST > | 06:SCAN HDFS 1 5.887ms 5.887ms 62 62 > 159.00 KB 48.00 MB tpcds_15000_decimal_parquet... > 12:HASH JOIN 20 24.246ms 33.507ms 2.23M 335.85M > 2.81 MB 25.23 KB INNER JOIN, BROADCAST > |--23:EXCHANGE 20 34.326us 49.903us 1.10K 1.96K > 0 0 BROADCAST > | 05:SCAN HDFS 1 24.594ms 24.594ms 1.10K 1.96K > 2.29 MB 32.00 MB tpcds_15000_decimal_parquet... > 11:HASH JOIN 20 4m8s 5m31s 2.23M 352.50M > 38.14 GB 30.98 GB INNER JOIN, PARTITIONED > |--22:EXCHANGE 20 20s051ms 54s305ms 12.85B 21.60B > 0 0 HASH(cs_bill_customer_sk,cs... > | 02:SCAN HDFS 20 3s716ms 4s533ms 12.85B 21.60B > 257.30 MB 264.00 MB tpcds_15000_decimal_parquet... > 21:EXCHANGE 20 64.831ms 101.634ms 16.49M 352.50M > 0 0 HASH(sr_customer_sk,sr_item... > 10:HASH JOIN 20 300.445ms 389.672ms 16.49M 352.50M > 2.94 MB 3.54 KB INNER JOIN, BROADCAST > |--20:EXCHANGE 20 11.347us 17.878us 122 206 > 0 0 BROADCAST > | 04:SCAN HDFS 1 12.903ms 12.903ms 122 206 > 3.50 MB 48.00 MB tpcds_15000_decimal_parquet... > 09:HASH JOIN 20 8s854ms 11s946ms 16.49M 3.83B > 1.01 GB 7.96 GB INNER JOIN, PARTITIONED > |--19:EXCHANGE 20 444.716ms 1s130ms 234.43M 4.32B > 0 0 HASH(sr_customer_sk,sr_item... > | 01:SCAN HDFS 20 1s235ms 1s397ms 234.43M 4.32B > 351.41 MB 352.00 MB tpcds_15000_decimal_parquet... > 18:EXCHANGE 20 1s216ms 2s815ms 389.28M 3.83B > 0 0 HASH(ss_customer_sk,ss_item... > 08:HASH JOIN 20 949.230ms 2s216ms 389.28M 3.83B > 2.71 MB 3.11 KB INNER JOIN, BROADCAST > |--17:EXCHANGE 20 11.615us 16.053us 30 181 > 0 0 BROADCAST > | 03:SCAN HDFS 1 14.187ms 14.187ms 30 181 > 2.89 MB 48.00 MB tpcds_15000_decimal_parquet... > 00:SCAN HDFS 20 620.986ms 759.453ms 389.28M 43.20B > 344.09 MB 440.00 MB tpcds_15000_decimal_parquet... > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org