Aman Sinha created IMPALA-9911:
----------------------------------

             Summary: IS [NOT] NULL predicate selectivity estimate is wrong if 
#nulls is 0
                 Key: IMPALA-9911
                 URL: https://issues.apache.org/jira/browse/IMPALA-9911
             Project: IMPALA
          Issue Type: Bug
          Components: Frontend
    Affects Versions: Impala 3.4.0
            Reporter: Aman Sinha
            Assignee: Aman Sinha


Consider the tpcds customer table .. its c_current_addr_sk column has #Nulls = 
0 as shown below.
{noformat}
 tpcds> show column stats customer;
+------------------------+--------+------------------+--------+----------+-------------------+
| Column                 | Type   | #Distinct Values | #Nulls | Max Size | Avg 
Size          |
+------------------------+--------+------------------+--------+----------+-------------------+
....
| c_current_cdemo_sk     | INT    | 91558            | 3438   | 4        | 4    
             |
| c_current_hdemo_sk     | INT    | 7376             | 3431   | 4        | 4    
             |
| c_current_addr_sk      | INT    | 42003            | 0      | 4        | 4    
             |
....
{noformat}

The cardinality estimate for the following predicates shows a default 
selectivity of 10% being applied which is not correct:
{noformat}
explain select c_current_addr_sk from customer where c_current_addr_sk is not 
null;
| 00:SCAN HDFS [tpcds.customer]                              |
|    HDFS partitions=1/1 files=1 size=12.60MB                |
|    predicates: c_current_addr_sk IS NOT NULL               |
|    row-size=4B cardinality=10.00K                          |
+------------------------------------------------------------+

explain select c_current_addr_sk from customer where c_current_addr_sk is null;
| 00:SCAN HDFS [tpcds.customer]                              |
|    HDFS partitions=1/1 files=1 size=12.60MB                |
|    predicates: c_current_addr_sk IS NULL                   |
|    row-size=4B cardinality=10.00K                          |
{noformat}





--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

Reply via email to