[jira] [Created] (IMPALA-8058) HBase scan cardinality division-by-zero leads to bogus cardinality

Paul Rogers (JIRA) Tue, 08 Jan 2019 16:00:35 -0800

Paul Rogers created IMPALA-8058:
-----------------------------------

             Summary: HBase scan cardinality division-by-zero leads to bogus 
cardinality
                 Key: IMPALA-8058
                 URL: https://issues.apache.org/jira/browse/IMPALA-8058
             Project: IMPALA
          Issue Type: Bug
          Components: Frontend
    Affects Versions: Impala 3.1.0
            Reporter: Paul Rogers



A particular HBase query has highly selective key filters and runs into code 
bugs that produce a bogus, huge cardinality value.

{{HbaseScanNode.computeStats()}} attempts to compute table cardinality by 
calling {{HBaseTable.getEstimatedRowStats()}}. This then calls into (in the 
latest versions) {{FeHBaseTable.getEstimatedRowStats()}}.

This code tries to estimate cardinality by:

* Scanning a set of regions.
* For each getting the size.
* Averaging a bunch of rows to estimate row width.

Once we know the size of the regions we need to scan, and the average row 
width, we can compute the scan cardinality.

The problem in this particular query is that the predicates are so selective 
that no regions match. As a result, the average row width is zero. We divide 
(as a double) the region size by 0 and get INF. We cast that to a long and get 
Long.MAX_VALUE. We then use that as our (highly bogus) cardinality estimate.

The code must:

* Detect the division-by-zero (now sample rows) case.
* Use an alternative estimate (such as multiplying total table row count from 
HMS by the filter selectivity.)




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Created] (IMPALA-8058) HBase scan cardinality division-by-zero leads to bogus cardinality

Reply via email to