[ 
https://issues.apache.org/jira/browse/IMPALA-3573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Rahn updated IMPALA-3573:
------------------------------
    Description: 
Applying selective runtime filters can drastically change the cardinality of 
scan nodes, the planner doesn't cost the runtime filters as filters as a result 
it misses out on a more selective plan.

In TPC-DS query29 there are three fact to dimension joins
 * (ss x d1) -> 389.28M rows
 * (sr x d2) -> 234.43M rows
 * (cs x d3) -> 12.85B rows

The planner doesn't re-evaluate the cardinality estimation of ss, sr and cs 
after the runtime filter are applied and puts ss as the left most node in the 
plan where it should have been cs.

Ideally this should be a bushy plan....

Query
{code:java}
select i_item_id
    ,i_item_desc
    ,s_store_id
    ,s_store_name
    ,sum(ss_quantity)        as store_sales_quantity
    ,sum(sr_return_quantity) as store_returns_quantity
    ,sum(cs_quantity)        as catalog_sales_quantity
 from
    store_sales
   ,store_returns
   ,catalog_sales
   ,date_dim             d1
   ,date_dim             d2
   ,date_dim             d3
   ,store
   ,item
 where
     d1.d_moy               = 4 
 and d1.d_year              = 1999
 and d1.d_date_sk           = ss_sold_date_sk
 and i_item_sk              = ss_item_sk
 and s_store_sk             = ss_store_sk
 and ss_customer_sk         = sr_customer_sk
 and ss_item_sk             = sr_item_sk
 and ss_ticket_number       = sr_ticket_number
 and sr_returned_date_sk    = d2.d_date_sk
 and d2.d_moy               between 4 and  4 + 3 
 and d2.d_year              = 1999
 and sr_customer_sk         = cs_bill_customer_sk
 and sr_item_sk             = cs_item_sk
 and cs_sold_date_sk        = d3.d_date_sk     
 and d3.d_year              in (1999,1999+1,1999+2)
 group by
    i_item_id
   ,i_item_desc
   ,s_store_id
   ,s_store_name
 order by
    i_item_id 
   ,i_item_desc
   ,s_store_id
   ,s_store_name
limit 100
{code}
Plan
{code:java}
28:MERGING-EXCHANGE [UNPARTITIONED]
|  order by: i_item_id ASC, i_item_desc ASC, s_store_id ASC, s_store_name ASC
|  limit: 100
|  hosts=20 per-host-mem=unavailable
|  tuple-ids=9 row-size=224B cardinality=100
|
16:TOP-N [LIMIT=100]
|  order by: i_item_id ASC, i_item_desc ASC, s_store_id ASC, s_store_name ASC
|  hosts=20 per-host-mem=21.89KB
|  tuple-ids=9 row-size=224B cardinality=100
|
27:AGGREGATE [FINALIZE]
|  output: sum:merge(ss_quantity), sum:merge(sr_return_quantity), 
sum:merge(cs_quantity)
|  group by: i_item_id, i_item_desc, s_store_id, s_store_name
|  hosts=20 per-host-mem=212.22MB
|  tuple-ids=8 row-size=224B cardinality=335852270
|
26:EXCHANGE [HASH(i_item_id,i_item_desc,s_store_id,s_store_name)]
|  hosts=20 per-host-mem=0B
|  tuple-ids=8 row-size=224B cardinality=335852270
|
15:AGGREGATE [STREAMING]
|  output: sum(ss_quantity), sum(sr_return_quantity), sum(cs_quantity)
|  group by: i_item_id, i_item_desc, s_store_id, s_store_name
|  hosts=20 per-host-mem=77.13GB
|  tuple-ids=8 row-size=224B cardinality=335852270
|
14:HASH JOIN [INNER JOIN, BROADCAST]
|  hash predicates: ss_item_sk = i_item_sk
|  runtime filters: RF000 <- i_item_sk
|  hosts=20 per-host-mem=5.24MB
|  tuple-ids=0,3,1,4,2,5,6,7 row-size=368B cardinality=335852270
|
|--25:EXCHANGE [BROADCAST]
|  |  hosts=1 per-host-mem=0B
|  |  tuple-ids=7 row-size=156B cardinality=32000
|  |
|  07:SCAN HDFS [tpcds_15000_decimal_parquet.item, RANDOM]
|     partitions=1/1 files=1 size=3.14MB
|     table stats: 32000 rows total
|     column stats: all
|     hosts=1 per-host-mem=48.00MB
|     tuple-ids=7 row-size=156B cardinality=32000
|
13:HASH JOIN [INNER JOIN, BROADCAST]
|  hash predicates: ss_store_sk = s_store_sk
|  runtime filters: RF001 <- s_store_sk
|  hosts=20 per-host-mem=4.00KB
|  tuple-ids=0,3,1,4,2,5,6 row-size=212B cardinality=335852270
|
|--24:EXCHANGE [BROADCAST]
|  |  hosts=1 per-host-mem=0B
|  |  tuple-ids=6 row-size=60B cardinality=62
|  |
|  06:SCAN HDFS [tpcds_15000_decimal_parquet.store, RANDOM]
|     partitions=1/1 files=1 size=11.92KB
|     table stats: 62 rows total
|     column stats: all
|     hosts=1 per-host-mem=48.00MB
|     tuple-ids=6 row-size=60B cardinality=62
|
12:HASH JOIN [INNER JOIN, BROADCAST]
|  hash predicates: cs_sold_date_sk = d3.d_date_sk
|  runtime filters: RF002 <- d3.d_date_sk
|  hosts=20 per-host-mem=25.23KB
|  tuple-ids=0,3,1,4,2,5 row-size=152B cardinality=335852270
|
|--23:EXCHANGE [BROADCAST]
|  |  hosts=1 per-host-mem=0B
|  |  tuple-ids=5 row-size=12B cardinality=1957
|  |
|  05:SCAN HDFS [tpcds_15000_decimal_parquet.date_dim d3, RANDOM]
|     partitions=1/1 files=1 size=2.17MB
|     predicates: d3.d_year IN (1999, 1999 + 1, 1999 + 2)
|     table stats: 73049 rows total
|     column stats: all
|     hosts=1 per-host-mem=32.00MB
|     tuple-ids=5 row-size=12B cardinality=1957
|
11:HASH JOIN [INNER JOIN, PARTITIONED]
|  hash predicates: sr_customer_sk = cs_bill_customer_sk, sr_item_sk = 
cs_item_sk
|  runtime filters: RF003 <- cs_bill_customer_sk, RF004 <- cs_item_sk
|  hosts=20 per-host-mem=30.98GB
|  tuple-ids=0,3,1,4,2 row-size=140B cardinality=352496606
|
|--22:EXCHANGE [HASH(cs_bill_customer_sk,cs_item_sk)]
|  |  hosts=20 per-host-mem=0B
|  |  tuple-ids=2 row-size=28B cardinality=21602442972
|  |
|  02:SCAN HDFS [tpcds_15000_decimal_parquet.catalog_sales, RANDOM]
|     partitions=1837/1837 files=6122 size=1.33TB
|     runtime filters: RF000 -> 
tpcds_15000_decimal_parquet.catalog_sales.cs_item_sk, RF002 -> cs_sold_date_sk
|     table stats: 21602442972 rows total
|     column stats: all
|     hosts=20 per-host-mem=264.00MB
|     tuple-ids=2 row-size=28B cardinality=21602442972
|
21:EXCHANGE [HASH(sr_customer_sk,sr_item_sk)]
|  hosts=20 per-host-mem=0B
|  tuple-ids=0,3,1,4 row-size=112B cardinality=352496606
|
10:HASH JOIN [INNER JOIN, BROADCAST]
|  hash predicates: sr_returned_date_sk = d2.d_date_sk
|  runtime filters: RF005 <- d2.d_date_sk
|  hosts=20 per-host-mem=3.54KB
|  tuple-ids=0,3,1,4 row-size=112B cardinality=352496606
|
|--20:EXCHANGE [BROADCAST]
|  |  hosts=1 per-host-mem=0B
|  |  tuple-ids=4 row-size=16B cardinality=206
|  |
|  04:SCAN HDFS [tpcds_15000_decimal_parquet.date_dim d2, RANDOM]
|     partitions=1/1 files=1 size=2.17MB
|     predicates: d2.d_year = 1999, d2.d_moy >= 4, d2.d_moy <= 4 + 3
|     table stats: 73049 rows total
|     column stats: all
|     hosts=1 per-host-mem=48.00MB
|     tuple-ids=4 row-size=16B cardinality=206
|
09:HASH JOIN [INNER JOIN, PARTITIONED]
|  hash predicates: ss_customer_sk = sr_customer_sk, ss_item_sk = sr_item_sk, 
ss_ticket_number = sr_ticket_number
|  runtime filters: RF006 <- sr_customer_sk, RF007 <- sr_item_sk, RF008 <- 
sr_ticket_number
|  hosts=20 per-host-mem=7.96GB
|  tuple-ids=0,3,1 row-size=96B cardinality=3834191129
|
|--19:EXCHANGE [HASH(sr_customer_sk,sr_item_sk,sr_ticket_number)]
|  |  hosts=20 per-host-mem=0B
|  |  tuple-ids=1 row-size=36B cardinality=4317083368
|  |
|  01:SCAN HDFS [tpcds_15000_decimal_parquet.store_returns, RANDOM]
|     partitions=2004/2004 files=2022 size=201.98GB
|     runtime filters: RF000 -> 
tpcds_15000_decimal_parquet.store_returns.sr_item_sk, RF003 -> sr_customer_sk, 
RF004 -> sr_item_sk, RF005 -> sr_returned_date_sk
|     table stats: 4317083368 rows total
|     column stats: all
|     hosts=20 per-host-mem=352.00MB
|     tuple-ids=1 row-size=36B cardinality=4317083368
|
18:EXCHANGE [HASH(ss_customer_sk,ss_item_sk,ss_ticket_number)]
|  hosts=20 per-host-mem=0B
|  tuple-ids=0,3 row-size=60B cardinality=3834191129
|
08:HASH JOIN [INNER JOIN, BROADCAST]
|  hash predicates: ss_sold_date_sk = d1.d_date_sk
|  runtime filters: RF009 <- d1.d_date_sk
|  hosts=20 per-host-mem=3.11KB
|  tuple-ids=0,3 row-size=60B cardinality=3834191129
|
|--17:EXCHANGE [BROADCAST]
|  |  hosts=1 per-host-mem=0B
|  |  tuple-ids=3 row-size=16B cardinality=181
|  |
|  03:SCAN HDFS [tpcds_15000_decimal_parquet.date_dim d1, RANDOM]
|     partitions=1/1 files=1 size=2.17MB
|     predicates: d1.d_year = 1999, d1.d_moy = 4
|     table stats: 73049 rows total
|     column stats: all
|     hosts=1 per-host-mem=48.00MB
|     tuple-ids=3 row-size=16B cardinality=181
|
00:SCAN HDFS [tpcds_15000_decimal_parquet.store_sales, RANDOM]
   partitions=1824/1824 files=6922 size=1.52TB
   runtime filters: RF000 -> ss_item_sk, RF001 -> ss_store_sk, RF003 -> 
tpcds_15000_decimal_parquet.store_sales.ss_customer_sk, RF004 -> 
tpcds_15000_decimal_parquet.store_sales.ss_item_sk, RF006 -> ss_customer_sk, 
RF007 -> ss_item_sk, RF008 -> ss_ticket_number, RF009 -> ss_sold_date_sk
   table stats: 43202446268 rows total
   column stats: all
   hosts=20 per-host-mem=440.00MB
   tuple-ids=0 row-size=44B cardinality=43202446268
{code}
Summary
{code:java}
Operator              #Hosts   Avg Time   Max Time    #Rows  Est. #Rows   Peak 
Mem  Est. Peak Mem  Detail                         
----------------------------------------------------------------------------------------------------------------------------------
28:MERGING-EXCHANGE        1    1.109ms    1.109ms    1.02K         100         
 0        -1.00 B  UNPARTITIONED                  
16:TOP-N                  20    3.652ms    5.942ms    2.00K         100  132.00 
KB       21.89 KB                                 
27:AGGREGATE              20  275.802ms  458.876ms  442.90K     335.85M   11.35 
MB      212.22 MB  FINALIZE                       
26:EXCHANGE               20   59.647ms  258.173ms    1.13M     335.85M         
 0              0  HASH(i_item_id,i_item_desc,... 
15:AGGREGATE              20  343.414ms  422.845ms    1.13M     335.85M  143.24 
MB       77.13 GB  STREAMING                      
14:HASH JOIN              20   30.930ms   41.410ms    2.20M     335.85M   12.99 
MB        5.24 MB  INNER JOIN, BROADCAST          
|--25:EXCHANGE            20    1.052ms    1.677ms   32.00K      32.00K         
 0              0  BROADCAST                      
|  07:SCAN HDFS            1   18.325ms   18.325ms   32.00K      32.00K    9.43 
MB       48.00 MB  tpcds_15000_decimal_parquet... 
13:HASH JOIN              20   18.778ms   25.275ms    2.20M     335.85M    3.49 
MB        4.00 KB  INNER JOIN, BROADCAST          
|--24:EXCHANGE            20   25.213us  301.428us       62          62         
 0              0  BROADCAST                      
|  06:SCAN HDFS            1    5.887ms    5.887ms       62          62  159.00 
KB       48.00 MB  tpcds_15000_decimal_parquet... 
12:HASH JOIN              20   24.246ms   33.507ms    2.23M     335.85M    2.81 
MB       25.23 KB  INNER JOIN, BROADCAST          
|--23:EXCHANGE            20   34.326us   49.903us    1.10K       1.96K         
 0              0  BROADCAST                      
|  05:SCAN HDFS            1   24.594ms   24.594ms    1.10K       1.96K    2.29 
MB       32.00 MB  tpcds_15000_decimal_parquet... 
11:HASH JOIN              20       4m8s      5m31s    2.23M     352.50M   38.14 
GB       30.98 GB  INNER JOIN, PARTITIONED        
|--22:EXCHANGE            20   20s051ms   54s305ms   12.85B      21.60B         
 0              0  HASH(cs_bill_customer_sk,cs... 
|  02:SCAN HDFS           20    3s716ms    4s533ms   12.85B      21.60B  257.30 
MB      264.00 MB  tpcds_15000_decimal_parquet... 
21:EXCHANGE               20   64.831ms  101.634ms   16.49M     352.50M         
 0              0  HASH(sr_customer_sk,sr_item... 
10:HASH JOIN              20  300.445ms  389.672ms   16.49M     352.50M    2.94 
MB        3.54 KB  INNER JOIN, BROADCAST          
|--20:EXCHANGE            20   11.347us   17.878us      122         206         
 0              0  BROADCAST                      
|  04:SCAN HDFS            1   12.903ms   12.903ms      122         206    3.50 
MB       48.00 MB  tpcds_15000_decimal_parquet... 
09:HASH JOIN              20    8s854ms   11s946ms   16.49M       3.83B    1.01 
GB        7.96 GB  INNER JOIN, PARTITIONED        
|--19:EXCHANGE            20  444.716ms    1s130ms  234.43M       4.32B         
 0              0  HASH(sr_customer_sk,sr_item... 
|  01:SCAN HDFS           20    1s235ms    1s397ms  234.43M       4.32B  351.41 
MB      352.00 MB  tpcds_15000_decimal_parquet... 
18:EXCHANGE               20    1s216ms    2s815ms  389.28M       3.83B         
 0              0  HASH(ss_customer_sk,ss_item... 
08:HASH JOIN              20  949.230ms    2s216ms  389.28M       3.83B    2.71 
MB        3.11 KB  INNER JOIN, BROADCAST          
|--17:EXCHANGE            20   11.615us   16.053us       30         181         
 0              0  BROADCAST                      
|  03:SCAN HDFS            1   14.187ms   14.187ms       30         181    2.89 
MB       48.00 MB  tpcds_15000_decimal_parquet... 
00:SCAN HDFS              20  620.986ms  759.453ms  389.28M      43.20B  344.09 
MB      440.00 MB  tpcds_15000_decimal_parquet... 
{code}

  was:
Applying selective runtime filters can drastically change the cardinality of 
scan nodes, the planner doesn't cost the runtime filters as filters as a result 
it misses out on a more selective plan.

In TPC-DS query29 there is three fact to dimension joins
 * (ss x d1) -> 389.28M rows
 * (sr x d2) -> 234.43M rows
 * (cs x d3) -> 12.85B rows

The planner doesn't re-evaluate the cardinality estimation of ss, sr and cs 
after the runtime filter are applied and puts ss as the left most node in the 
plan where it should have been cs.

Ideally this should be a bushy plan....

Query
{code:java}
select i_item_id
    ,i_item_desc
    ,s_store_id
    ,s_store_name
    ,sum(ss_quantity)        as store_sales_quantity
    ,sum(sr_return_quantity) as store_returns_quantity
    ,sum(cs_quantity)        as catalog_sales_quantity
 from
    store_sales
   ,store_returns
   ,catalog_sales
   ,date_dim             d1
   ,date_dim             d2
   ,date_dim             d3
   ,store
   ,item
 where
     d1.d_moy               = 4 
 and d1.d_year              = 1999
 and d1.d_date_sk           = ss_sold_date_sk
 and i_item_sk              = ss_item_sk
 and s_store_sk             = ss_store_sk
 and ss_customer_sk         = sr_customer_sk
 and ss_item_sk             = sr_item_sk
 and ss_ticket_number       = sr_ticket_number
 and sr_returned_date_sk    = d2.d_date_sk
 and d2.d_moy               between 4 and  4 + 3 
 and d2.d_year              = 1999
 and sr_customer_sk         = cs_bill_customer_sk
 and sr_item_sk             = cs_item_sk
 and cs_sold_date_sk        = d3.d_date_sk     
 and d3.d_year              in (1999,1999+1,1999+2)
 group by
    i_item_id
   ,i_item_desc
   ,s_store_id
   ,s_store_name
 order by
    i_item_id 
   ,i_item_desc
   ,s_store_id
   ,s_store_name
limit 100
{code}
Plan
{code:java}
28:MERGING-EXCHANGE [UNPARTITIONED]
|  order by: i_item_id ASC, i_item_desc ASC, s_store_id ASC, s_store_name ASC
|  limit: 100
|  hosts=20 per-host-mem=unavailable
|  tuple-ids=9 row-size=224B cardinality=100
|
16:TOP-N [LIMIT=100]
|  order by: i_item_id ASC, i_item_desc ASC, s_store_id ASC, s_store_name ASC
|  hosts=20 per-host-mem=21.89KB
|  tuple-ids=9 row-size=224B cardinality=100
|
27:AGGREGATE [FINALIZE]
|  output: sum:merge(ss_quantity), sum:merge(sr_return_quantity), 
sum:merge(cs_quantity)
|  group by: i_item_id, i_item_desc, s_store_id, s_store_name
|  hosts=20 per-host-mem=212.22MB
|  tuple-ids=8 row-size=224B cardinality=335852270
|
26:EXCHANGE [HASH(i_item_id,i_item_desc,s_store_id,s_store_name)]
|  hosts=20 per-host-mem=0B
|  tuple-ids=8 row-size=224B cardinality=335852270
|
15:AGGREGATE [STREAMING]
|  output: sum(ss_quantity), sum(sr_return_quantity), sum(cs_quantity)
|  group by: i_item_id, i_item_desc, s_store_id, s_store_name
|  hosts=20 per-host-mem=77.13GB
|  tuple-ids=8 row-size=224B cardinality=335852270
|
14:HASH JOIN [INNER JOIN, BROADCAST]
|  hash predicates: ss_item_sk = i_item_sk
|  runtime filters: RF000 <- i_item_sk
|  hosts=20 per-host-mem=5.24MB
|  tuple-ids=0,3,1,4,2,5,6,7 row-size=368B cardinality=335852270
|
|--25:EXCHANGE [BROADCAST]
|  |  hosts=1 per-host-mem=0B
|  |  tuple-ids=7 row-size=156B cardinality=32000
|  |
|  07:SCAN HDFS [tpcds_15000_decimal_parquet.item, RANDOM]
|     partitions=1/1 files=1 size=3.14MB
|     table stats: 32000 rows total
|     column stats: all
|     hosts=1 per-host-mem=48.00MB
|     tuple-ids=7 row-size=156B cardinality=32000
|
13:HASH JOIN [INNER JOIN, BROADCAST]
|  hash predicates: ss_store_sk = s_store_sk
|  runtime filters: RF001 <- s_store_sk
|  hosts=20 per-host-mem=4.00KB
|  tuple-ids=0,3,1,4,2,5,6 row-size=212B cardinality=335852270
|
|--24:EXCHANGE [BROADCAST]
|  |  hosts=1 per-host-mem=0B
|  |  tuple-ids=6 row-size=60B cardinality=62
|  |
|  06:SCAN HDFS [tpcds_15000_decimal_parquet.store, RANDOM]
|     partitions=1/1 files=1 size=11.92KB
|     table stats: 62 rows total
|     column stats: all
|     hosts=1 per-host-mem=48.00MB
|     tuple-ids=6 row-size=60B cardinality=62
|
12:HASH JOIN [INNER JOIN, BROADCAST]
|  hash predicates: cs_sold_date_sk = d3.d_date_sk
|  runtime filters: RF002 <- d3.d_date_sk
|  hosts=20 per-host-mem=25.23KB
|  tuple-ids=0,3,1,4,2,5 row-size=152B cardinality=335852270
|
|--23:EXCHANGE [BROADCAST]
|  |  hosts=1 per-host-mem=0B
|  |  tuple-ids=5 row-size=12B cardinality=1957
|  |
|  05:SCAN HDFS [tpcds_15000_decimal_parquet.date_dim d3, RANDOM]
|     partitions=1/1 files=1 size=2.17MB
|     predicates: d3.d_year IN (1999, 1999 + 1, 1999 + 2)
|     table stats: 73049 rows total
|     column stats: all
|     hosts=1 per-host-mem=32.00MB
|     tuple-ids=5 row-size=12B cardinality=1957
|
11:HASH JOIN [INNER JOIN, PARTITIONED]
|  hash predicates: sr_customer_sk = cs_bill_customer_sk, sr_item_sk = 
cs_item_sk
|  runtime filters: RF003 <- cs_bill_customer_sk, RF004 <- cs_item_sk
|  hosts=20 per-host-mem=30.98GB
|  tuple-ids=0,3,1,4,2 row-size=140B cardinality=352496606
|
|--22:EXCHANGE [HASH(cs_bill_customer_sk,cs_item_sk)]
|  |  hosts=20 per-host-mem=0B
|  |  tuple-ids=2 row-size=28B cardinality=21602442972
|  |
|  02:SCAN HDFS [tpcds_15000_decimal_parquet.catalog_sales, RANDOM]
|     partitions=1837/1837 files=6122 size=1.33TB
|     runtime filters: RF000 -> 
tpcds_15000_decimal_parquet.catalog_sales.cs_item_sk, RF002 -> cs_sold_date_sk
|     table stats: 21602442972 rows total
|     column stats: all
|     hosts=20 per-host-mem=264.00MB
|     tuple-ids=2 row-size=28B cardinality=21602442972
|
21:EXCHANGE [HASH(sr_customer_sk,sr_item_sk)]
|  hosts=20 per-host-mem=0B
|  tuple-ids=0,3,1,4 row-size=112B cardinality=352496606
|
10:HASH JOIN [INNER JOIN, BROADCAST]
|  hash predicates: sr_returned_date_sk = d2.d_date_sk
|  runtime filters: RF005 <- d2.d_date_sk
|  hosts=20 per-host-mem=3.54KB
|  tuple-ids=0,3,1,4 row-size=112B cardinality=352496606
|
|--20:EXCHANGE [BROADCAST]
|  |  hosts=1 per-host-mem=0B
|  |  tuple-ids=4 row-size=16B cardinality=206
|  |
|  04:SCAN HDFS [tpcds_15000_decimal_parquet.date_dim d2, RANDOM]
|     partitions=1/1 files=1 size=2.17MB
|     predicates: d2.d_year = 1999, d2.d_moy >= 4, d2.d_moy <= 4 + 3
|     table stats: 73049 rows total
|     column stats: all
|     hosts=1 per-host-mem=48.00MB
|     tuple-ids=4 row-size=16B cardinality=206
|
09:HASH JOIN [INNER JOIN, PARTITIONED]
|  hash predicates: ss_customer_sk = sr_customer_sk, ss_item_sk = sr_item_sk, 
ss_ticket_number = sr_ticket_number
|  runtime filters: RF006 <- sr_customer_sk, RF007 <- sr_item_sk, RF008 <- 
sr_ticket_number
|  hosts=20 per-host-mem=7.96GB
|  tuple-ids=0,3,1 row-size=96B cardinality=3834191129
|
|--19:EXCHANGE [HASH(sr_customer_sk,sr_item_sk,sr_ticket_number)]
|  |  hosts=20 per-host-mem=0B
|  |  tuple-ids=1 row-size=36B cardinality=4317083368
|  |
|  01:SCAN HDFS [tpcds_15000_decimal_parquet.store_returns, RANDOM]
|     partitions=2004/2004 files=2022 size=201.98GB
|     runtime filters: RF000 -> 
tpcds_15000_decimal_parquet.store_returns.sr_item_sk, RF003 -> sr_customer_sk, 
RF004 -> sr_item_sk, RF005 -> sr_returned_date_sk
|     table stats: 4317083368 rows total
|     column stats: all
|     hosts=20 per-host-mem=352.00MB
|     tuple-ids=1 row-size=36B cardinality=4317083368
|
18:EXCHANGE [HASH(ss_customer_sk,ss_item_sk,ss_ticket_number)]
|  hosts=20 per-host-mem=0B
|  tuple-ids=0,3 row-size=60B cardinality=3834191129
|
08:HASH JOIN [INNER JOIN, BROADCAST]
|  hash predicates: ss_sold_date_sk = d1.d_date_sk
|  runtime filters: RF009 <- d1.d_date_sk
|  hosts=20 per-host-mem=3.11KB
|  tuple-ids=0,3 row-size=60B cardinality=3834191129
|
|--17:EXCHANGE [BROADCAST]
|  |  hosts=1 per-host-mem=0B
|  |  tuple-ids=3 row-size=16B cardinality=181
|  |
|  03:SCAN HDFS [tpcds_15000_decimal_parquet.date_dim d1, RANDOM]
|     partitions=1/1 files=1 size=2.17MB
|     predicates: d1.d_year = 1999, d1.d_moy = 4
|     table stats: 73049 rows total
|     column stats: all
|     hosts=1 per-host-mem=48.00MB
|     tuple-ids=3 row-size=16B cardinality=181
|
00:SCAN HDFS [tpcds_15000_decimal_parquet.store_sales, RANDOM]
   partitions=1824/1824 files=6922 size=1.52TB
   runtime filters: RF000 -> ss_item_sk, RF001 -> ss_store_sk, RF003 -> 
tpcds_15000_decimal_parquet.store_sales.ss_customer_sk, RF004 -> 
tpcds_15000_decimal_parquet.store_sales.ss_item_sk, RF006 -> ss_customer_sk, 
RF007 -> ss_item_sk, RF008 -> ss_ticket_number, RF009 -> ss_sold_date_sk
   table stats: 43202446268 rows total
   column stats: all
   hosts=20 per-host-mem=440.00MB
   tuple-ids=0 row-size=44B cardinality=43202446268
{code}
Summary
{code:java}
Operator              #Hosts   Avg Time   Max Time    #Rows  Est. #Rows   Peak 
Mem  Est. Peak Mem  Detail                         
----------------------------------------------------------------------------------------------------------------------------------
28:MERGING-EXCHANGE        1    1.109ms    1.109ms    1.02K         100         
 0        -1.00 B  UNPARTITIONED                  
16:TOP-N                  20    3.652ms    5.942ms    2.00K         100  132.00 
KB       21.89 KB                                 
27:AGGREGATE              20  275.802ms  458.876ms  442.90K     335.85M   11.35 
MB      212.22 MB  FINALIZE                       
26:EXCHANGE               20   59.647ms  258.173ms    1.13M     335.85M         
 0              0  HASH(i_item_id,i_item_desc,... 
15:AGGREGATE              20  343.414ms  422.845ms    1.13M     335.85M  143.24 
MB       77.13 GB  STREAMING                      
14:HASH JOIN              20   30.930ms   41.410ms    2.20M     335.85M   12.99 
MB        5.24 MB  INNER JOIN, BROADCAST          
|--25:EXCHANGE            20    1.052ms    1.677ms   32.00K      32.00K         
 0              0  BROADCAST                      
|  07:SCAN HDFS            1   18.325ms   18.325ms   32.00K      32.00K    9.43 
MB       48.00 MB  tpcds_15000_decimal_parquet... 
13:HASH JOIN              20   18.778ms   25.275ms    2.20M     335.85M    3.49 
MB        4.00 KB  INNER JOIN, BROADCAST          
|--24:EXCHANGE            20   25.213us  301.428us       62          62         
 0              0  BROADCAST                      
|  06:SCAN HDFS            1    5.887ms    5.887ms       62          62  159.00 
KB       48.00 MB  tpcds_15000_decimal_parquet... 
12:HASH JOIN              20   24.246ms   33.507ms    2.23M     335.85M    2.81 
MB       25.23 KB  INNER JOIN, BROADCAST          
|--23:EXCHANGE            20   34.326us   49.903us    1.10K       1.96K         
 0              0  BROADCAST                      
|  05:SCAN HDFS            1   24.594ms   24.594ms    1.10K       1.96K    2.29 
MB       32.00 MB  tpcds_15000_decimal_parquet... 
11:HASH JOIN              20       4m8s      5m31s    2.23M     352.50M   38.14 
GB       30.98 GB  INNER JOIN, PARTITIONED        
|--22:EXCHANGE            20   20s051ms   54s305ms   12.85B      21.60B         
 0              0  HASH(cs_bill_customer_sk,cs... 
|  02:SCAN HDFS           20    3s716ms    4s533ms   12.85B      21.60B  257.30 
MB      264.00 MB  tpcds_15000_decimal_parquet... 
21:EXCHANGE               20   64.831ms  101.634ms   16.49M     352.50M         
 0              0  HASH(sr_customer_sk,sr_item... 
10:HASH JOIN              20  300.445ms  389.672ms   16.49M     352.50M    2.94 
MB        3.54 KB  INNER JOIN, BROADCAST          
|--20:EXCHANGE            20   11.347us   17.878us      122         206         
 0              0  BROADCAST                      
|  04:SCAN HDFS            1   12.903ms   12.903ms      122         206    3.50 
MB       48.00 MB  tpcds_15000_decimal_parquet... 
09:HASH JOIN              20    8s854ms   11s946ms   16.49M       3.83B    1.01 
GB        7.96 GB  INNER JOIN, PARTITIONED        
|--19:EXCHANGE            20  444.716ms    1s130ms  234.43M       4.32B         
 0              0  HASH(sr_customer_sk,sr_item... 
|  01:SCAN HDFS           20    1s235ms    1s397ms  234.43M       4.32B  351.41 
MB      352.00 MB  tpcds_15000_decimal_parquet... 
18:EXCHANGE               20    1s216ms    2s815ms  389.28M       3.83B         
 0              0  HASH(ss_customer_sk,ss_item... 
08:HASH JOIN              20  949.230ms    2s216ms  389.28M       3.83B    2.71 
MB        3.11 KB  INNER JOIN, BROADCAST          
|--17:EXCHANGE            20   11.615us   16.053us       30         181         
 0              0  BROADCAST                      
|  03:SCAN HDFS            1   14.187ms   14.187ms       30         181    2.89 
MB       48.00 MB  tpcds_15000_decimal_parquet... 
00:SCAN HDFS              20  620.986ms  759.453ms  389.28M      43.20B  344.09 
MB      440.00 MB  tpcds_15000_decimal_parquet... 
{code}


> Planner doesn't take into account runtime filter selectivity
> ------------------------------------------------------------
>
>                 Key: IMPALA-3573
>                 URL: https://issues.apache.org/jira/browse/IMPALA-3573
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Frontend
>    Affects Versions: Impala 2.5.0
>            Reporter: Mostafa Mokhtar
>            Priority: Major
>              Labels: bushy, planner
>
> Applying selective runtime filters can drastically change the cardinality of 
> scan nodes, the planner doesn't cost the runtime filters as filters as a 
> result it misses out on a more selective plan.
> In TPC-DS query29 there are three fact to dimension joins
>  * (ss x d1) -> 389.28M rows
>  * (sr x d2) -> 234.43M rows
>  * (cs x d3) -> 12.85B rows
> The planner doesn't re-evaluate the cardinality estimation of ss, sr and cs 
> after the runtime filter are applied and puts ss as the left most node in the 
> plan where it should have been cs.
> Ideally this should be a bushy plan....
> Query
> {code:java}
> select i_item_id
>     ,i_item_desc
>     ,s_store_id
>     ,s_store_name
>     ,sum(ss_quantity)        as store_sales_quantity
>     ,sum(sr_return_quantity) as store_returns_quantity
>     ,sum(cs_quantity)        as catalog_sales_quantity
>  from
>     store_sales
>    ,store_returns
>    ,catalog_sales
>    ,date_dim             d1
>    ,date_dim             d2
>    ,date_dim             d3
>    ,store
>    ,item
>  where
>      d1.d_moy               = 4 
>  and d1.d_year              = 1999
>  and d1.d_date_sk           = ss_sold_date_sk
>  and i_item_sk              = ss_item_sk
>  and s_store_sk             = ss_store_sk
>  and ss_customer_sk         = sr_customer_sk
>  and ss_item_sk             = sr_item_sk
>  and ss_ticket_number       = sr_ticket_number
>  and sr_returned_date_sk    = d2.d_date_sk
>  and d2.d_moy               between 4 and  4 + 3 
>  and d2.d_year              = 1999
>  and sr_customer_sk         = cs_bill_customer_sk
>  and sr_item_sk             = cs_item_sk
>  and cs_sold_date_sk        = d3.d_date_sk     
>  and d3.d_year              in (1999,1999+1,1999+2)
>  group by
>     i_item_id
>    ,i_item_desc
>    ,s_store_id
>    ,s_store_name
>  order by
>     i_item_id 
>    ,i_item_desc
>    ,s_store_id
>    ,s_store_name
> limit 100
> {code}
> Plan
> {code:java}
> 28:MERGING-EXCHANGE [UNPARTITIONED]
> |  order by: i_item_id ASC, i_item_desc ASC, s_store_id ASC, s_store_name ASC
> |  limit: 100
> |  hosts=20 per-host-mem=unavailable
> |  tuple-ids=9 row-size=224B cardinality=100
> |
> 16:TOP-N [LIMIT=100]
> |  order by: i_item_id ASC, i_item_desc ASC, s_store_id ASC, s_store_name ASC
> |  hosts=20 per-host-mem=21.89KB
> |  tuple-ids=9 row-size=224B cardinality=100
> |
> 27:AGGREGATE [FINALIZE]
> |  output: sum:merge(ss_quantity), sum:merge(sr_return_quantity), 
> sum:merge(cs_quantity)
> |  group by: i_item_id, i_item_desc, s_store_id, s_store_name
> |  hosts=20 per-host-mem=212.22MB
> |  tuple-ids=8 row-size=224B cardinality=335852270
> |
> 26:EXCHANGE [HASH(i_item_id,i_item_desc,s_store_id,s_store_name)]
> |  hosts=20 per-host-mem=0B
> |  tuple-ids=8 row-size=224B cardinality=335852270
> |
> 15:AGGREGATE [STREAMING]
> |  output: sum(ss_quantity), sum(sr_return_quantity), sum(cs_quantity)
> |  group by: i_item_id, i_item_desc, s_store_id, s_store_name
> |  hosts=20 per-host-mem=77.13GB
> |  tuple-ids=8 row-size=224B cardinality=335852270
> |
> 14:HASH JOIN [INNER JOIN, BROADCAST]
> |  hash predicates: ss_item_sk = i_item_sk
> |  runtime filters: RF000 <- i_item_sk
> |  hosts=20 per-host-mem=5.24MB
> |  tuple-ids=0,3,1,4,2,5,6,7 row-size=368B cardinality=335852270
> |
> |--25:EXCHANGE [BROADCAST]
> |  |  hosts=1 per-host-mem=0B
> |  |  tuple-ids=7 row-size=156B cardinality=32000
> |  |
> |  07:SCAN HDFS [tpcds_15000_decimal_parquet.item, RANDOM]
> |     partitions=1/1 files=1 size=3.14MB
> |     table stats: 32000 rows total
> |     column stats: all
> |     hosts=1 per-host-mem=48.00MB
> |     tuple-ids=7 row-size=156B cardinality=32000
> |
> 13:HASH JOIN [INNER JOIN, BROADCAST]
> |  hash predicates: ss_store_sk = s_store_sk
> |  runtime filters: RF001 <- s_store_sk
> |  hosts=20 per-host-mem=4.00KB
> |  tuple-ids=0,3,1,4,2,5,6 row-size=212B cardinality=335852270
> |
> |--24:EXCHANGE [BROADCAST]
> |  |  hosts=1 per-host-mem=0B
> |  |  tuple-ids=6 row-size=60B cardinality=62
> |  |
> |  06:SCAN HDFS [tpcds_15000_decimal_parquet.store, RANDOM]
> |     partitions=1/1 files=1 size=11.92KB
> |     table stats: 62 rows total
> |     column stats: all
> |     hosts=1 per-host-mem=48.00MB
> |     tuple-ids=6 row-size=60B cardinality=62
> |
> 12:HASH JOIN [INNER JOIN, BROADCAST]
> |  hash predicates: cs_sold_date_sk = d3.d_date_sk
> |  runtime filters: RF002 <- d3.d_date_sk
> |  hosts=20 per-host-mem=25.23KB
> |  tuple-ids=0,3,1,4,2,5 row-size=152B cardinality=335852270
> |
> |--23:EXCHANGE [BROADCAST]
> |  |  hosts=1 per-host-mem=0B
> |  |  tuple-ids=5 row-size=12B cardinality=1957
> |  |
> |  05:SCAN HDFS [tpcds_15000_decimal_parquet.date_dim d3, RANDOM]
> |     partitions=1/1 files=1 size=2.17MB
> |     predicates: d3.d_year IN (1999, 1999 + 1, 1999 + 2)
> |     table stats: 73049 rows total
> |     column stats: all
> |     hosts=1 per-host-mem=32.00MB
> |     tuple-ids=5 row-size=12B cardinality=1957
> |
> 11:HASH JOIN [INNER JOIN, PARTITIONED]
> |  hash predicates: sr_customer_sk = cs_bill_customer_sk, sr_item_sk = 
> cs_item_sk
> |  runtime filters: RF003 <- cs_bill_customer_sk, RF004 <- cs_item_sk
> |  hosts=20 per-host-mem=30.98GB
> |  tuple-ids=0,3,1,4,2 row-size=140B cardinality=352496606
> |
> |--22:EXCHANGE [HASH(cs_bill_customer_sk,cs_item_sk)]
> |  |  hosts=20 per-host-mem=0B
> |  |  tuple-ids=2 row-size=28B cardinality=21602442972
> |  |
> |  02:SCAN HDFS [tpcds_15000_decimal_parquet.catalog_sales, RANDOM]
> |     partitions=1837/1837 files=6122 size=1.33TB
> |     runtime filters: RF000 -> 
> tpcds_15000_decimal_parquet.catalog_sales.cs_item_sk, RF002 -> cs_sold_date_sk
> |     table stats: 21602442972 rows total
> |     column stats: all
> |     hosts=20 per-host-mem=264.00MB
> |     tuple-ids=2 row-size=28B cardinality=21602442972
> |
> 21:EXCHANGE [HASH(sr_customer_sk,sr_item_sk)]
> |  hosts=20 per-host-mem=0B
> |  tuple-ids=0,3,1,4 row-size=112B cardinality=352496606
> |
> 10:HASH JOIN [INNER JOIN, BROADCAST]
> |  hash predicates: sr_returned_date_sk = d2.d_date_sk
> |  runtime filters: RF005 <- d2.d_date_sk
> |  hosts=20 per-host-mem=3.54KB
> |  tuple-ids=0,3,1,4 row-size=112B cardinality=352496606
> |
> |--20:EXCHANGE [BROADCAST]
> |  |  hosts=1 per-host-mem=0B
> |  |  tuple-ids=4 row-size=16B cardinality=206
> |  |
> |  04:SCAN HDFS [tpcds_15000_decimal_parquet.date_dim d2, RANDOM]
> |     partitions=1/1 files=1 size=2.17MB
> |     predicates: d2.d_year = 1999, d2.d_moy >= 4, d2.d_moy <= 4 + 3
> |     table stats: 73049 rows total
> |     column stats: all
> |     hosts=1 per-host-mem=48.00MB
> |     tuple-ids=4 row-size=16B cardinality=206
> |
> 09:HASH JOIN [INNER JOIN, PARTITIONED]
> |  hash predicates: ss_customer_sk = sr_customer_sk, ss_item_sk = sr_item_sk, 
> ss_ticket_number = sr_ticket_number
> |  runtime filters: RF006 <- sr_customer_sk, RF007 <- sr_item_sk, RF008 <- 
> sr_ticket_number
> |  hosts=20 per-host-mem=7.96GB
> |  tuple-ids=0,3,1 row-size=96B cardinality=3834191129
> |
> |--19:EXCHANGE [HASH(sr_customer_sk,sr_item_sk,sr_ticket_number)]
> |  |  hosts=20 per-host-mem=0B
> |  |  tuple-ids=1 row-size=36B cardinality=4317083368
> |  |
> |  01:SCAN HDFS [tpcds_15000_decimal_parquet.store_returns, RANDOM]
> |     partitions=2004/2004 files=2022 size=201.98GB
> |     runtime filters: RF000 -> 
> tpcds_15000_decimal_parquet.store_returns.sr_item_sk, RF003 -> 
> sr_customer_sk, RF004 -> sr_item_sk, RF005 -> sr_returned_date_sk
> |     table stats: 4317083368 rows total
> |     column stats: all
> |     hosts=20 per-host-mem=352.00MB
> |     tuple-ids=1 row-size=36B cardinality=4317083368
> |
> 18:EXCHANGE [HASH(ss_customer_sk,ss_item_sk,ss_ticket_number)]
> |  hosts=20 per-host-mem=0B
> |  tuple-ids=0,3 row-size=60B cardinality=3834191129
> |
> 08:HASH JOIN [INNER JOIN, BROADCAST]
> |  hash predicates: ss_sold_date_sk = d1.d_date_sk
> |  runtime filters: RF009 <- d1.d_date_sk
> |  hosts=20 per-host-mem=3.11KB
> |  tuple-ids=0,3 row-size=60B cardinality=3834191129
> |
> |--17:EXCHANGE [BROADCAST]
> |  |  hosts=1 per-host-mem=0B
> |  |  tuple-ids=3 row-size=16B cardinality=181
> |  |
> |  03:SCAN HDFS [tpcds_15000_decimal_parquet.date_dim d1, RANDOM]
> |     partitions=1/1 files=1 size=2.17MB
> |     predicates: d1.d_year = 1999, d1.d_moy = 4
> |     table stats: 73049 rows total
> |     column stats: all
> |     hosts=1 per-host-mem=48.00MB
> |     tuple-ids=3 row-size=16B cardinality=181
> |
> 00:SCAN HDFS [tpcds_15000_decimal_parquet.store_sales, RANDOM]
>    partitions=1824/1824 files=6922 size=1.52TB
>    runtime filters: RF000 -> ss_item_sk, RF001 -> ss_store_sk, RF003 -> 
> tpcds_15000_decimal_parquet.store_sales.ss_customer_sk, RF004 -> 
> tpcds_15000_decimal_parquet.store_sales.ss_item_sk, RF006 -> ss_customer_sk, 
> RF007 -> ss_item_sk, RF008 -> ss_ticket_number, RF009 -> ss_sold_date_sk
>    table stats: 43202446268 rows total
>    column stats: all
>    hosts=20 per-host-mem=440.00MB
>    tuple-ids=0 row-size=44B cardinality=43202446268
> {code}
> Summary
> {code:java}
> Operator              #Hosts   Avg Time   Max Time    #Rows  Est. #Rows   
> Peak Mem  Est. Peak Mem  Detail                         
> ----------------------------------------------------------------------------------------------------------------------------------
> 28:MERGING-EXCHANGE        1    1.109ms    1.109ms    1.02K         100       
>    0        -1.00 B  UNPARTITIONED                  
> 16:TOP-N                  20    3.652ms    5.942ms    2.00K         100  
> 132.00 KB       21.89 KB                                 
> 27:AGGREGATE              20  275.802ms  458.876ms  442.90K     335.85M   
> 11.35 MB      212.22 MB  FINALIZE                       
> 26:EXCHANGE               20   59.647ms  258.173ms    1.13M     335.85M       
>    0              0  HASH(i_item_id,i_item_desc,... 
> 15:AGGREGATE              20  343.414ms  422.845ms    1.13M     335.85M  
> 143.24 MB       77.13 GB  STREAMING                      
> 14:HASH JOIN              20   30.930ms   41.410ms    2.20M     335.85M   
> 12.99 MB        5.24 MB  INNER JOIN, BROADCAST          
> |--25:EXCHANGE            20    1.052ms    1.677ms   32.00K      32.00K       
>    0              0  BROADCAST                      
> |  07:SCAN HDFS            1   18.325ms   18.325ms   32.00K      32.00K    
> 9.43 MB       48.00 MB  tpcds_15000_decimal_parquet... 
> 13:HASH JOIN              20   18.778ms   25.275ms    2.20M     335.85M    
> 3.49 MB        4.00 KB  INNER JOIN, BROADCAST          
> |--24:EXCHANGE            20   25.213us  301.428us       62          62       
>    0              0  BROADCAST                      
> |  06:SCAN HDFS            1    5.887ms    5.887ms       62          62  
> 159.00 KB       48.00 MB  tpcds_15000_decimal_parquet... 
> 12:HASH JOIN              20   24.246ms   33.507ms    2.23M     335.85M    
> 2.81 MB       25.23 KB  INNER JOIN, BROADCAST          
> |--23:EXCHANGE            20   34.326us   49.903us    1.10K       1.96K       
>    0              0  BROADCAST                      
> |  05:SCAN HDFS            1   24.594ms   24.594ms    1.10K       1.96K    
> 2.29 MB       32.00 MB  tpcds_15000_decimal_parquet... 
> 11:HASH JOIN              20       4m8s      5m31s    2.23M     352.50M   
> 38.14 GB       30.98 GB  INNER JOIN, PARTITIONED        
> |--22:EXCHANGE            20   20s051ms   54s305ms   12.85B      21.60B       
>    0              0  HASH(cs_bill_customer_sk,cs... 
> |  02:SCAN HDFS           20    3s716ms    4s533ms   12.85B      21.60B  
> 257.30 MB      264.00 MB  tpcds_15000_decimal_parquet... 
> 21:EXCHANGE               20   64.831ms  101.634ms   16.49M     352.50M       
>    0              0  HASH(sr_customer_sk,sr_item... 
> 10:HASH JOIN              20  300.445ms  389.672ms   16.49M     352.50M    
> 2.94 MB        3.54 KB  INNER JOIN, BROADCAST          
> |--20:EXCHANGE            20   11.347us   17.878us      122         206       
>    0              0  BROADCAST                      
> |  04:SCAN HDFS            1   12.903ms   12.903ms      122         206    
> 3.50 MB       48.00 MB  tpcds_15000_decimal_parquet... 
> 09:HASH JOIN              20    8s854ms   11s946ms   16.49M       3.83B    
> 1.01 GB        7.96 GB  INNER JOIN, PARTITIONED        
> |--19:EXCHANGE            20  444.716ms    1s130ms  234.43M       4.32B       
>    0              0  HASH(sr_customer_sk,sr_item... 
> |  01:SCAN HDFS           20    1s235ms    1s397ms  234.43M       4.32B  
> 351.41 MB      352.00 MB  tpcds_15000_decimal_parquet... 
> 18:EXCHANGE               20    1s216ms    2s815ms  389.28M       3.83B       
>    0              0  HASH(ss_customer_sk,ss_item... 
> 08:HASH JOIN              20  949.230ms    2s216ms  389.28M       3.83B    
> 2.71 MB        3.11 KB  INNER JOIN, BROADCAST          
> |--17:EXCHANGE            20   11.615us   16.053us       30         181       
>    0              0  BROADCAST                      
> |  03:SCAN HDFS            1   14.187ms   14.187ms       30         181    
> 2.89 MB       48.00 MB  tpcds_15000_decimal_parquet... 
> 00:SCAN HDFS              20  620.986ms  759.453ms  389.28M      43.20B  
> 344.09 MB      440.00 MB  tpcds_15000_decimal_parquet... 
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

Reply via email to