[ 
https://issues.apache.org/jira/browse/IMPALA-8015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Rogers updated IMPALA-8015:
--------------------------------
    Description: 
The expression used to calculate the cardinality for a M:N (“generic”) join is 
incorrect. Scroll to the end for the proposed fix.

The standard calculation for joins is explained in Swami & Schiefer, [On the 
Estimation of Join Result 
Sizes|https://pdfs.semanticscholar.org/2735/672262940a23cfce8d4cc1e25c0191de7da7.pdf]
 (S&S). Note especially section 3, Background. Also see [How Good Are Query 
Optimizers, Really?|http://www.vldb.org/pvldb/vol9/p204-leis.pdf] by Leis et al.

h4. Current Implementation

The code uses the following:

{code:java}
    long result = -1;
    for (EqJoinConjunctScanSlots slots: eqJoinConjunctSlots) {
      double lhsAdjNdv = slots.lhsNdv();
      if (slots.lhsNumRows() > lhsCard) lhsAdjNdv *= lhsCard / 
slots.lhsNumRows();
      double rhsAdjNdv = slots.rhsNdv();
      if (slots.rhsNumRows() > rhsCard) rhsAdjNdv *= rhsCard / 
slots.rhsNumRows();
      long joinCard = Math.round((lhsCard / Math.max(1, Math.max(lhsAdjNdv, 
rhsAdjNdv))) *
          rhsCard);
      if (result == -1) {
        result = joinCard;
      } else {
        result = Math.min(result, joinCard);
      }
  }
{code}

The above is hard to follow, which is part of the problem. Basically, the math 
for each key is:

{noformat}
|L.ki'| = min( |L.ki| * |L'| / |L|, |L.ki| )

        = |L.ki| * min( |L'| / |L|, 1 )
{noformat}

This says that we reduce the left-hand key by the selectivity of the left-hand 
filter, but we always have at least one key. The logic for the right side is 
the same.

There is a subtle error: the compound key as a whole is reduced by the 
selectivity of the filter. But, we can't reduce each key by that factor or we 
end up reducing the compound key by the selectivity to the power of the key 
count. This is not a bug here because of the wrong way we compute the overall 
join cardinaly shown below.

The last line is correct as it reduces to:

{noformat}
|join| = |L'| * |R'| / max( |L.ki'|, |R.ki'| )
{noformat}

Where:

* {{L}} is the relation on the left side of the join, {{R}} the right.
* {{L'}} is the result of applying scan filters to table {{L}}. Similarly for 
{{R'}}.
* {{|T|}} is the cardinality (number of rows) in table {{T}}.
* {{L.k}} is the (possibly compound) key column for the left table. Similarly 
for {{R.k}}.
* {{L.ki}} is the ith column within a compound key.

The code is also wrong in how it combines the keys to get a final answer. The 
code produces correct answers for a single key, but incorrect answers for 
compound keys. The code says that the final join cardinality is:

{noformat}
|join| = min( |L'| * |R'| / max( |L.ki'|, |R.ki'| ) )

       = |L'| * |R'| / max i=1 ...( max( |L.ki'|, |R.ki'| ) )
{noformat}

That is, the join cardinality is given by the largest component of the compound 
key. The result of this bug that the code estimate will use NDVs that are too 
small, producing estimates which are too large, which throws off join selection 
and could degrade query performance if it leads to an inefficient plan.


h4. Proposed Solution

The correct math, worked out in IMPALA-8014, is:

{noformat}
                      ∏ |T.ki|
|T.k'| = |T'| * min( ----------, 1 )
                        |T|


                  |L’| * |R’|
|L’ ⋈ R’| = -----------------------
             max( |L.k'|, |R.k'| )
{noformat}

The above says that the key cardinality determined by the cardinality of the 
input relation after filtering. This is either used as is (if the compound key 
cardinality is greater than the table cardinality), or scaled by the ratio of 
compound key cardinality to table cardinality.

Notice the subtle differences between the proposed and current solutions.

The solution can be improved.

* IMPALA-8018 observes that the resulting code is the same for the "FK/PK" and 
"generic" cases and proposes to unify the two.
* IMPALA-8213 proposes a solution when expressions are correlated between the 
left and right hand inputs.
* IMPALA-8218 says we should use a "simple urn model" to calculate the key 
estimate, not the simple linear model used above and in the code.
* IMPALA-XXXX notes the complexity of computing the filtered table cardinality 
{{T'}} on the left side when the left side is join.


  was:
TL;DR: the expression used to calculate the cardinality for a M:N (“generic”) 
join is incorrect. Scroll to the end for the proposed fix. See IMPALA-8018 for 
math details.
h4. Current Implementation

The code uses the following:
{code:java}
      double lhsAdjNdv = slots.lhsNdv();
      if (slots.lhsNumRows() > lhsCard) lhsAdjNdv *= lhsCard / 
slots.lhsNumRows();
      double rhsAdjNdv = slots.rhsNdv();
      if (slots.rhsNumRows() > rhsCard) rhsAdjNdv *= rhsCard / 
slots.rhsNumRows();
      long joinCard = Math.round((lhsCard / Math.max(1, Math.max(lhsAdjNdv, 
rhsAdjNdv))) *
          rhsCard);
{code}
As noted in IMPALA-8014, the above attempts to adjust NDVs for the correlated 
filter case.

Where:
 * {{lhsCard}} is the output cardinality of the l (previous join) table or 
{{|L'|}}
 * {{rhsCard}} is the output cardinality of the r (scanned) table or {{|R'|}}
 * {slots.lhsNdv()}} is the NDV of the left key or {{|L.k1|}}
 * {slots.lnsNumRows()}} is the cardinality of the LHS table, or {{|L|}}
 * {slots.rhsNdv()}} is the NDV of the right key or {{|R.k2|}}
 * {slots.rnsNumRows()}} is the cardinality of the LHS table, or {{|R|}}

Translated to our notation:
{noformat}
adjNdv(L.k1) = |L.k1| * ( |L'| / |L| if |L| > |L'|, 1 otherwise )
adjNdv(R.k2) = |R.k2| * ( |R'| / |R| if |R| > |R'|, 1 otherwise )
|L’ ⋈ R’| = |L’| * |R’| / max(adjNdv(L.k1), adjNdv(R.k2))
{noformat}
We can make two simplifications:
 * Since {{|X’| <= |X|}}, and when {{|X'| = |X|}} then {{|X’| / |X| = 1}} we 
can drop the conditional term.
 * Since {{|k| <= |X|}} and {{|X'|}} is positive, we can see that neither 
{{adjNdv}} term can ever be less than one, so we can ignore the corresponding 
{{max()}} in the code.

This gives:
{noformat}
adjNdv(L.k1) = |L.k1| * |L'| / |L|
adjNdv(R.k2) = |R.k2| * |R'| / |R|

                    |L’| * |R’|
|L’ ⋈ R’| = -------------------------------
            max(adjNdv(L.k1), adjNdv(R.k2))
{noformat}
The meaning of the {{adjNdv}} values is that we reduce each NDV proportional to 
the the number of rows selected. If we take the adjusted NDVs as the correct 
values (but see below) the rest of the expression matches the derived 
expression.
h4. Code Bug

However, it *does not* make sense to adjust the NDVs for the M:N case — the one 
for which this code applies.

For example, suppose we start with 100 key values, and we select half the rows. 
How many keys are left?
 * If we start with 100 rows, then, yes, we do reduce the keys by half.
 * But, if we start with 10,000 rows, removing half the rows still leaves 5,000 
rows to hold our 100 keys.

The intuition is right: the number of matches should decrease if the scan 
filter becomes more selective. But, that is exactly with the two scan term 
({{|𝜎L|}} and {{|𝜎R|}} provide. It need not be done again for the keys.

The result of this bug that the code estimate will use NDVs that are too small, 
producing estimates which are too large, which throws of join selection and 
could degrade query performance if it leads to an inefficient plan.

That this attempted adjustment does not, in fact, work is not surprising. As 
noted in S&S (section 3.1): “We do not know of any algorithm that *correctly* 
takes into account both local predicates and join predicates.”
h4. Worked Example

Let's check with a detailed example:
 * {{|L| = 10000}}
 * {{|R| = 100}}
 * {{|L’| = 3000}}
 * {{|R’| = 50}}
 * {{ndv(L.k1) = |L.k1| = 40}}
 * {{ndv(R.k2) = |R.k2| = 20}}

Let's work this out intuitively first. The join is the normal cross join with 
adjustments:
{noformat}
|L’ ⋈ R’| = |L| * |R| * adjustments
{noformat}
Adjustments:
 * The left side scan returns only 30% of the total rows.
 * The left hand has twice the key values of the right, so we have to discard 
half of the left rows.
 * Of the L keys with a match in R, it will match, on average, 100/20 = 5 rows.
 * But half of R is filtered out, reducing the available matching rows.

And:
{noformat}
|L’ ⋈ R’| = (|L| / 2) * (|R| / 20 / 2)
          = 10000 * 30% / 2 * 100 * / 20 / 2
          = 1500 * 2.5
          = 3750
{noformat}
Using the Equation 5 from IMPALA-8018 (which is Equation 2 in the S&S paper):
{noformat}
                |L’| * |R’|
|L’ ⋈ R’| = -------------------
            max(|L.k1|, |R.k2|)

          = 3000 * 50 / max(40, 20)
          = 3000 * 50 / 40
          = 3750
{noformat}
Which works out.

Now let's check the code's expression:
{noformat}
adjNdv(L.k1) = |L.k1| * |L'| / |L|
             = 40 * 3000 / 10,000
             = 12
adjNdv(R.k2) = |R.k2| * |R'| / |R|
             = 20 * 50 / 100
             = 10

                    |L’| * |R’|
|L’ ⋈ R’| = -------------------------------
            max(adjNdv(L.k1), adjNdv(R.k2))

          = 3000 * 50 / max(12, 10)
          = 150,000 / 12
          = 12,500
{noformat}
Which does not quite work out, verifying the bug. Note that, as a result of the 
bug, we overestimate the cardinality by 3 times.
h4. Proposed Fix

See IMPALA-8018 for the proposed fix for this and IMPALA-8014.


> Incorrect cardinality calculation for the generic case
> ------------------------------------------------------
>
>                 Key: IMPALA-8015
>                 URL: https://issues.apache.org/jira/browse/IMPALA-8015
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Frontend
>    Affects Versions: Impala 3.1.0
>            Reporter: Paul Rogers
>            Assignee: Paul Rogers
>            Priority: Major
>
> The expression used to calculate the cardinality for a M:N (“generic”) join 
> is incorrect. Scroll to the end for the proposed fix.
> The standard calculation for joins is explained in Swami & Schiefer, [On the 
> Estimation of Join Result 
> Sizes|https://pdfs.semanticscholar.org/2735/672262940a23cfce8d4cc1e25c0191de7da7.pdf]
>  (S&S). Note especially section 3, Background. Also see [How Good Are Query 
> Optimizers, Really?|http://www.vldb.org/pvldb/vol9/p204-leis.pdf] by Leis et 
> al.
> h4. Current Implementation
> The code uses the following:
> {code:java}
>     long result = -1;
>     for (EqJoinConjunctScanSlots slots: eqJoinConjunctSlots) {
>       double lhsAdjNdv = slots.lhsNdv();
>       if (slots.lhsNumRows() > lhsCard) lhsAdjNdv *= lhsCard / 
> slots.lhsNumRows();
>       double rhsAdjNdv = slots.rhsNdv();
>       if (slots.rhsNumRows() > rhsCard) rhsAdjNdv *= rhsCard / 
> slots.rhsNumRows();
>       long joinCard = Math.round((lhsCard / Math.max(1, Math.max(lhsAdjNdv, 
> rhsAdjNdv))) *
>           rhsCard);
>       if (result == -1) {
>         result = joinCard;
>       } else {
>         result = Math.min(result, joinCard);
>       }
>   }
> {code}
> The above is hard to follow, which is part of the problem. Basically, the 
> math for each key is:
> {noformat}
> |L.ki'| = min( |L.ki| * |L'| / |L|, |L.ki| )
>         = |L.ki| * min( |L'| / |L|, 1 )
> {noformat}
> This says that we reduce the left-hand key by the selectivity of the 
> left-hand filter, but we always have at least one key. The logic for the 
> right side is the same.
> There is a subtle error: the compound key as a whole is reduced by the 
> selectivity of the filter. But, we can't reduce each key by that factor or we 
> end up reducing the compound key by the selectivity to the power of the key 
> count. This is not a bug here because of the wrong way we compute the overall 
> join cardinaly shown below.
> The last line is correct as it reduces to:
> {noformat}
> |join| = |L'| * |R'| / max( |L.ki'|, |R.ki'| )
> {noformat}
> Where:
> * {{L}} is the relation on the left side of the join, {{R}} the right.
> * {{L'}} is the result of applying scan filters to table {{L}}. Similarly for 
> {{R'}}.
> * {{|T|}} is the cardinality (number of rows) in table {{T}}.
> * {{L.k}} is the (possibly compound) key column for the left table. Similarly 
> for {{R.k}}.
> * {{L.ki}} is the ith column within a compound key.
> The code is also wrong in how it combines the keys to get a final answer. The 
> code produces correct answers for a single key, but incorrect answers for 
> compound keys. The code says that the final join cardinality is:
> {noformat}
> |join| = min( |L'| * |R'| / max( |L.ki'|, |R.ki'| ) )
>        = |L'| * |R'| / max i=1 ...( max( |L.ki'|, |R.ki'| ) )
> {noformat}
> That is, the join cardinality is given by the largest component of the 
> compound key. The result of this bug that the code estimate will use NDVs 
> that are too small, producing estimates which are too large, which throws off 
> join selection and could degrade query performance if it leads to an 
> inefficient plan.
> h4. Proposed Solution
> The correct math, worked out in IMPALA-8014, is:
> {noformat}
>                       ∏ |T.ki|
> |T.k'| = |T'| * min( ----------, 1 )
>                         |T|
>                   |L’| * |R’|
> |L’ ⋈ R’| = -----------------------
>              max( |L.k'|, |R.k'| )
> {noformat}
> The above says that the key cardinality determined by the cardinality of the 
> input relation after filtering. This is either used as is (if the compound 
> key cardinality is greater than the table cardinality), or scaled by the 
> ratio of compound key cardinality to table cardinality.
> Notice the subtle differences between the proposed and current solutions.
> The solution can be improved.
> * IMPALA-8018 observes that the resulting code is the same for the "FK/PK" 
> and "generic" cases and proposes to unify the two.
> * IMPALA-8213 proposes a solution when expressions are correlated between the 
> left and right hand inputs.
> * IMPALA-8218 says we should use a "simple urn model" to calculate the key 
> estimate, not the simple linear model used above and in the code.
> * IMPALA-XXXX notes the complexity of computing the filtered table 
> cardinality {{T'}} on the left side when the left side is join.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to