bharath v created IMPALA-7560:
---------------------------------

             Summary: Better selectivity estimate for != (not equals) binary 
predicate
                 Key: IMPALA-7560
                 URL: https://issues.apache.org/jira/browse/IMPALA-7560
             Project: IMPALA
          Issue Type: Bug
          Components: Frontend
    Affects Versions: Impala 2.12.0, Impala 2.10.0, Impala 2.9.0, Impala 2.8.0, 
Impala 2.13.0
            Reporter: bharath v


Currently we use the default selectivity estimate for any binary predicate with 
op other than EQ / NON_DISTINCT.

{noformat}
// Determine selectivity
    // TODO: Compute selectivity for nested predicates.
    // TODO: Improve estimation using histograms.
    Reference<SlotRef> slotRefRef = new Reference<SlotRef>();
    if ((op_ == Operator.EQ || op_ == Operator.NOT_DISTINCT)
        && isSingleColumnPredicate(slotRefRef, null)) {
      long distinctValues = slotRefRef.getRef().getNumDistinctValues();
      if (distinctValues > 0) {
        selectivity_ = 1.0 / distinctValues;
        selectivity_ = Math.max(0, Math.min(1, selectivity_));
      }
    }
{noformat}

This can give very conservative estimates. For example:

{noformat}
[localhost:21000] tpch> select * from nation where n_regionkey != 1;
[localhost:21000] tpch> summary;
+--------------+--------+----------+----------+-------+------------+-----------+---------------+-------------+
| Operator     | #Hosts | Avg Time | Max Time | *#Rows* | *Est. #Rows* | Peak 
Mem  | Est. Peak Mem | Detail      |
+--------------+--------+----------+----------+-------+------------+-----------+---------------+-------------+
| 00:SCAN HDFS | 1      | 3.32ms   | 3.32ms   | *20*    | *3*          | 143.00 
KB | 16.00 MB      | tpch.nation |
+--------------+--------+----------+----------+-------+------------+-----------+---------------+-------------+
[localhost:21000] tpch> 
{noformat}

Ideally we could've inversed the selecitivity  to 4/5 (=1 - 1/5) that can give 
better estimate.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to