[ 
https://issues.apache.org/jira/browse/IMPALA-3685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong reassigned IMPALA-3685:
-------------------------------------

    Assignee:     (was: Alexander Behm)

> Planner produces incorrect join cardinality estimation when inequality 
> predicate is used on dimension table
> -----------------------------------------------------------------------------------------------------------
>
>                 Key: IMPALA-3685
>                 URL: https://issues.apache.org/jira/browse/IMPALA-3685
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Frontend
>    Affects Versions: Impala 2.5.0
>            Reporter: Mostafa Mokhtar
>            Priority: Minor
>              Labels: planner
>         Attachments: query14.sql.2.out copy
>
>
> It appears that when inequality predicate is applied on a fact to dimension 
> join the filter selectivity on the dimension table is not reflected on the 
> join cardinality estimation. 
> This issue was first found in TPC-DS Q14 attached. 
> Query
> {code}
> explain select
>     ss_quantity quantity
> from
>     store_sales,
>     date_dim
> where
>     ss_sold_date_sk = d_date_sk
>         and d_year  < 2000
> {code}
> Plan
> {code}
> +----------------------------------------------------------+
> | Explain String                                           |
> +----------------------------------------------------------+
> | Estimated Per-Host Requirements: Memory=88.06MB VCores=2 |
> |                                                          |
> | 04:EXCHANGE [UNPARTITIONED]                              |
> | |  hosts=8 per-host-mem=unavailable                      |
> | |  tuple-ids=0,1 row-size=16B cardinality=2879987999     |
> | |                                                        |
> | 02:HASH JOIN [INNER JOIN, BROADCAST]                     |
> | |  hash predicates: ss_sold_date_sk = d_date_sk          |
> | |  runtime filters: RF000 <- d_date_sk                   |
> | |  hosts=8 per-host-mem=62.78KB                          |
> | |  tuple-ids=0,1 row-size=16B cardinality=2879987999     |
> | |                                                        |
> | |--03:EXCHANGE [BROADCAST]                               |
> | |  |  hosts=1 per-host-mem=0B                            |
> | |  |  tuple-ids=1 row-size=8B cardinality=7305           |
> | |  |                                                     |
> | |  01:SCAN HDFS [tpcds_1000_parquet.date_dim, RANDOM]    |
> | |     partitions=1/1 files=1 size=2.17MB                 |
> | |     predicates: d_year < 2000                          |
> | |     table stats: 73049 rows total                      |
> | |     column stats: all                                  |
> | |     hosts=1 per-host-mem=32.00MB                       |
> | |     tuple-ids=1 row-size=8B cardinality=7305           |
> | |                                                        |
> | 00:SCAN HDFS [tpcds_1000_parquet.store_sales, RANDOM]    |
> |    partitions=1824/1824 files=1824 size=189.24GB         |
> |    runtime filters: RF000 -> ss_sold_date_sk             |
> |    table stats: 2879987999 rows total                    |
> |    column stats: all                                     |
> |    hosts=8 per-host-mem=88.00MB                          |
> |    tuple-ids=0 row-size=8B cardinality=2879987999        |
> +----------------------------------------------------------+
> {code}
> When an equality predicate is used the selectivity of the filter on the 
> dimension table is reflected on the  join
> {code}
> select 
>     ss_quantity
> from
>     store_sales,
>     date_dim
> where
>     ss_sold_date_sk = d_date_sk
>         and d_year = 1999;
> {code}
> {code}
> +----------------------------------------------------------+
> | Explain String                                           |
> +----------------------------------------------------------+
> | Estimated Per-Host Requirements: Memory=88.00MB VCores=2 |
> |                                                          |
> | 04:EXCHANGE [UNPARTITIONED]                              |
> | |  hosts=8 per-host-mem=unavailable                      |
> | |  tuple-ids=0,1 row-size=16B cardinality=588944915      |
> | |                                                        |
> | 02:HASH JOIN [INNER JOIN, BROADCAST]                     |
> | |  hash predicates: ss_sold_date_sk = d_date_sk          |
> | |  runtime filters: RF000 <- d_date_sk                   |
> | |  hosts=8 per-host-mem=3.21KB                           |
> | |  tuple-ids=0,1 row-size=16B cardinality=588944915      |
> | |                                                        |
> | |--03:EXCHANGE [BROADCAST]                               |
> | |  |  hosts=1 per-host-mem=0B                            |
> | |  |  tuple-ids=1 row-size=8B cardinality=373            |
> | |  |                                                     |
> | |  01:SCAN HDFS [tpcds_1000_parquet.date_dim, RANDOM]    |
> | |     partitions=1/1 files=1 size=2.17MB                 |
> | |     predicates: d_year = 1999                          |
> | |     table stats: 73049 rows total                      |
> | |     column stats: all                                  |
> | |     hosts=1 per-host-mem=32.00MB                       |
> | |     tuple-ids=1 row-size=8B cardinality=373            |
> | |                                                        |
> | 00:SCAN HDFS [tpcds_1000_parquet.store_sales, RANDOM]    |
> |    partitions=1824/1824 files=1824 size=189.24GB         |
> |    runtime filters: RF000 -> ss_sold_date_sk             |
> |    table stats: 2879987999 rows total                    |
> |    column stats: all                                     |
> |    hosts=8 per-host-mem=88.00MB                          |
> |    tuple-ids=0 row-size=8B cardinality=2879987999        |
> +----------------------------------------------------------+
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

Reply via email to