[ 
https://issues.apache.org/jira/browse/IMPALA-8014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Rogers updated IMPALA-8014:
--------------------------------
    Description: 
The match is wrong when we estimate the cardinality of a join we've labeled as 
"FK/PK" (commonly known as many-to-one or M:1.)

This is a version of the notes found 
[here|https://github.com/paul-rogers/impala/wiki/Planner].

h4. Background

Join cardinality is estimated in {{JoinNode.computeStats()}}, and especially in 
{{getJoinCardinality()}}, which implements the infamous "FK/PK" heuristics. 
"The FK/PK detection logic is based on the assumption that most joins are 
FK/PK. ... In the absence of relevant stats, we assume FK/PK with a join 
selectivity of 1." As noted above, M:1 is more common name for "FK/PK".

Earlier logic ensured that the larger table is on the probe (left) side. So, 
the code assumes that the L is the detail table with the foreign key (FK), 
while the smaller build-side table is the master table with the primary key 
(PK). That is L:R is M:1, joined on {{L.<FK> = R.<PK>}}.

Next, let's translate the code's naming to the more traditional (and 
descriptive, if somewhat offensive) RDBMS naming:

* *master table:* the 1 side fo the M:1 relationship, has a unique primary key 
(PK) typically enforced via a unique index. Referred to as the R (build) table 
in Impala. We will use the symbol {{m}} for the master table, and {{m.pk}} for 
the (presumed) primary key column.
* *detail table:* the M side of the M:1 relationship, has a non-unique foreign 
key (FK) that points to one entry in the master table. Referred to as the L 
(probe) table in Impala. We will use the symbol {{d}} for the detail table, and 
{{d.fk}} for the (presumed) foreign key column.

And, we need some notation:

* {{|x|}} is the cardinality of relation (table) {{x}}. The relation can be a 
base table or the output of a prior join.
* {{|scan\(x)|}} is the cardinality out of an Impala scan node after applying 
filter predicates.
* {{ndv\(x)}} is the NDV of column {{x}}.

To paraphrase a comment in the code, cardinality estimation is:

```
\|d >< m| = |scan(d)| * (|scan(m)| / |m|) * (ndv(m.pk) / ndv(d.fk))
```

Where:

* {{|scan\(d)|}} is the cardinality of the detail scan after applying any 
predicates to the table cardinality.
* {{|scan\(m)|}} is the same for the master scan.
* {{|m|}} is the cardinality of the master table itself.
* {{ndv(m.pk)}} is the NDV of the presumed primary key column in the master 
(right, build, smaller) table.
* {{ndv(d.fk)}} is the NDV of the presumed foreign key column in the detail 
(left, probe, larger) table.

h4. Deriving the Formula

Let's tear this apart.

In RDBMS, a Primary Key is a unique column (or, more typically, set of columns) 
that are generally indexed via a unique index, ensuring uniqueness. Since keys 
are unique:

{noformat}
ndv(m.pk) = |m|
{noformat}

Then since there is a M:1 relationship between the detail (d) and master (m) 
tables:

{noformat}
ndv(d.fk) <= |m| << |d|
{noformat}

If we read all rows from both tables, and all primary keys appear as foreign 
keys, the the join cardinality is simply:

{noformat}
|d >< m| = |d|
{noformat}

If we filter out some detail rows, then cardinality is:

{noformat}
|d >< m| = |scan(d)|
{noformat}

If we filter out rows in the master table, and we assume uniform distribution 
of foreign keys, then the probability of any foreign key finding a primary key 
match is:

{noformat}
p(pk) = |scan(m)| / |m|
{noformat}

The result varies is a probabilty of a given pk presenting to the join that 
runs from 0 (no rows match the predicates) to 1 (all rows present).

The revised join cardinality is:

{noformat}
|d >< m| = |scan(d)| * p(pk)
           |scan(d)| * |scan(m)| / |m|
{noformat}

There is some built-in selectivity: perhaps not all the primary keys are 
actually referenced. In fact, we can figure out how many actually are used. 
Lets assume that not all primary keys appear as foreign keys in the detail 
table. Then, the probability of a match is:

{noformat}
p(match) = ndv(d.fk) / ndv(m.pk)
{noformat}

If all primary keys appear in the detail table, the probability of a match is 
1. If none appear (the detail table is empty), then the probability is 0.

On the other if more foreign keys exist than primary keys, then some of the 
foreign keys won't find a match on an inner join, so:

{noformat}
p(match) = ndv(m.pk) / ndv(d.fk)
{noformat}

If all primary keys appear in the detail table, the probability of a match is 
1. If there are twice as many foreign keys as primary keys, the probability of 
a match is 50%.

Combining the two cases:

{noformat}
p(match) = ndv(d.fk) / ndv(m.pk) if ndv(d.fk) < ndv(m.pk),
           ndv(m.pk) / ndv(d.fk) otherwise.
{noformat}

(As it turns out, the above logic is used in the non-PK/FK, case in the Planner 
code. But, as we'll see, half is missing in the FK/PK code.)

Adding this to the join cardinality estimate for the normal case of fewer FK's 
than PK's we get:

{noformat}
|d >< m| = |scan(d)| * p(pk) * p(match)
         = |scan(d)| * |scan(m)| / |m| * ndv(d.fk) / ndv(m.pk)
{noformat}

On the other hand, if we assume more foreign keys than primary keys, we get:

{noformat}
|d >< m| = |scan(d)| * p(pk) * p(match)
         = |scan(d)| * |scan(m)| / |m| * ndv(m.pk) / ndv(d.fk)
{noformat}

h4. Bug in the Code

Compare the above derived formula to the one in the code:

{noformat}
cardinality = |scan(d)| * (|scan(m)| / |m|) * (NDV(m.pk) / NDV(d.fk))
{noformat}

We can check how the code implements the formula:

{code:java}
      ndvRatio = slots.rhsNdv() / slots.lhsNdv();
      double rhsSelectivity = rhsCard / slots.rhsNumRows();
      long joinCard = (long) Math.ceil(lhsCard * rhsSelectivity * ndvRatio);
{code}

So, the comment accurately describes the code.

But, it contains a severe error. Consider a normal 1:M case: the master-detail 
relationship such as customers and orders. Not all customers have an order, but 
all orders must have been placed by customers. As a result, the normal use is 
fewer foreign keys than primary keys. But, the formula above is for the 
{{ndv(d.fk) > ndv(f.pk)}} case. Why is this wrong?

Assume half of customers never placed an order. The ratio in the code is 
{{NDV(m.pk) / NDV(d.fk)}} or, say, 100 customers / 50 customers who placed an 
order or 2. That is, the fewer customer who have placed an order, the *greater* 
the assumed join cardinality.

The common case in data warehousing is a fact (detail) table and a dimension 
(master) table. Again, if the dimension table has more keys than the detail 
table uses, the cardinality estimate will over estimate.

When the ratio is low, the cardinality estimate may become so high that the 
planner rejects a valid join in favor of one that seems better, but is actually 
much worse at execution time.

The actual formula in the comments (with some annotations) is:

{noformat}
              |scan(L)|     |scan(R)|
              ----^-----    ----^-----
cardinality = |child(0)| * (|child(1)| / |R|) * (NDV(R.d) / NDV(L.c))

L = detail, L.c = d.fk
R = master, R.d = m.pk
{noformat}

To ensure we are reading all this correctly, let' work out a detailed example. 
Let's assume that our detail table has 10K rows, our master table 100 rows, and 
that only 25 of the master records are referenced. Let's assume the scans 
remove no rows. Using the formula derived above:

{noformat}
|join| = |scan(d)| * (|scan(m)| / |m|) * (ndv(d.fk) / ndv(m.pk))
       = |d| * (|m| / |m|) * (ndv(d.fk) / ndv(m.pk))
       = 10,000 * 1 * (25 / 100)
       = 10,000 / 4
       = 2,500
{noformat}

Using the formula from the code:

{noformat}
L = d
R = m
cardinality = |child(0)| * (|child(1)| / |R|) * (NDV(R.d) / NDV(L.c))
            = |scan(L)| * (|scan(R)| / |R|) * (NDV(R.d) / NDV(L.c))
            = |scan(d)| * (|scan(m)| / |m|) * (NDV(m.pk) / NDV(d.fk))
            = |d| * (|m| / |m|) * (NDV(m.pk) / NDV(d.fk)) 
            = 10,000 * 1 * (100 / 25)
            = 10,000 * 4
            = 40,000
{noformat}

As a result, the fewer of the primary keys are referenced, the more we expect 
the input rows to be replicated.

Let's check this empirically by printing out the `ndvRatio` when running some 
queries:

{noformat}
ndv ratio = 2.0977582147609786
ndv ratio = 3.3183831043488135
ndv ratio = 2.0977582147609786
ndv ratio = 2.0977582147609786
{noformat}

So, this seems like a real bug.

h4. Proposed Fix

The easiest fix is just to invert the ratio. But, running {{PlannerTest}} shows 
that we need to handle both cases:

{noformat}
ndv ratio = 1.5245451773554224
ndv ratio = 0.10545630784098954
ndv ratio = 0.959424038561171
ndv ratio = 335.3333333333333
{noformat}

If we inverted these as proposed, we'd see:

{noformat}
0.67
10
1.04
0.003
{noformat}

This means we have cases where {{ndv(f.pk) > ndv(m.pk)}} *and* {{ndv(f.pk) < 
ndv(m.pk)}}. Thus, we need to compute the probability using the two-part 
calculation shown above.

The proposed fix is to add that calculation to the code so that the ndv ratio 
in the snippet of {{PlannerTest}} above would be:

{noformat}
0.67
0.10
0.96
0.003
{noformat}

The probability is always less than 1.

As noted in the text above, the code already (attempts) to use this calculation 
in the non-PK/FK case (though that code also has bugs.)

  was:
The match is wrong when we estimate the cardinality of a join we've labeled as 
"FK/PK" (commonly known as many-to-one or M:1.)

This is a version of the notes found 
[here|https://github.com/paul-rogers/impala/wiki/Planner].

h4. Background

Join cardinality is estimated in {{JoinNode.computeStats()}}, and especially in 
{{getJoinCardinality()}}, which implements the infamous "FK/PK" heuristics. 
"The FK/PK detection logic is based on the assumption that most joins are 
FK/PK. ... In the absence of relevant stats, we assume FK/PK with a join 
selectivity of 1." As noted above, M:1 is more common name for "FK/PK".

Earlier logic ensured that the larger table is on the probe (left) side. So, 
the code assumes that the L is the detail table with the foreign key (FK), 
while the smaller build-side table is the master table with the primary key 
(PK). That is L:R is M:1, joined on {{L.<FK> = R.<PK>}}.

Next, let's translate the code's naming to the more traditional (and 
descriptive, if somewhat offensive) RDBMS naming:

* *master table:* the 1 side fo the M:1 relationship, has a unique primary key 
(PK) typically enforced via a unique index. Referred to as the R (build) table 
in Impala. We will use the symbol {{m}} for the master table, and {{m.pk}} for 
the (presumed) primary key column.
* *detail table:* the M side of the M:1 relationship, has a non-unique foreign 
key (FK) that points to one entry in the master table. Referred to as the L 
(probe) table in Impala. We will use the symbol {{d}} for the detail table, and 
{{d.fk}} for the (presumed) foreign key column.

And, we need some notation:

* {{|x|}} is the cardinality of relation (table) {{x}}. The relation can be a 
base table or the output of a prior join.
* {{|scan\(x)|}} is the cardinality out of an Impala scan node after applying 
filter predicates.
* {{ndv\(x)}} is the NDV of column {{x}}.

To paraphrase a comment in the code, cardinality estimation is:

```
\|d >< m| = |scan(d)| * (|scan(m)| / |m|) * (ndv(m.pk) / ndv(d.fk))
```

Where:

* {{|scan\(d)|}} is the cardinality of the detail scan after applying any 
predicates to the table cardinality.
* {{|scan\(m)|}} is the same for the master scan.
* {{|m|}} is the cardinality of the master table itself.
* {{ndv(m.pk)}} is the NDV of the presumed primary key column in the master 
(right, build, smaller) table.
* {{ndv(d.fk)}} is the NDV of the presumed foreign key column in the detail 
(left, probe, larger) table.

h4. Deriving the Formula

Let's tear this apart.

In RDBMS, a Primary Key is a unique column (or, more typically, set of columns) 
that are generally indexed via a unique index, ensuring uniqueness. Since keys 
are unique:

{noformat}
ndv(m.pk) = |m|
{noformat}

Then since there is a M:1 relationship between the detail (d) and master (m) 
tables:

{noformat}
ndv(d.fk) <= |m| << |d|
{noformat}

If we read all rows from both tables, and all primary keys appear as foreign 
keys, the the join cardinality is simply:

{noformat}
|d >< m| = |d|
{noformat}

If we filter out some detail rows, then cardinality is:

{noformat}
|d >< m| = |scan(d)|
{noformat}

If we filter out rows in the master table, and we assume uniform distribution 
of foreign keys, then the probability of any foreign key finding a primary key 
match is:

{noformat}
p(pk) = |scan(m)| / |m|
{noformat}

The result varies is a probabilty of a given pk presenting to the join that 
runs from 0 (no rows match the predicates) to 1 (all rows present).

The revised join cardinality is:

{noformat}
|d >< m| = |scan(d)| * p(pk)
           |scan(d)| * |scan(m)| / |m|
{noformat}

There is some built-in selectivity: perhaps not all the primary keys are 
actually referenced. In fact, we can figure out how many actually are used. 
Lets assume that not all primary keys appear as foreign keys in the detail 
table. Then, the probability of a match is:

{noformat}
p(match) = ndv(d.fk) / ndv(m.pk)
{noformat}

If all primary keys appear in the detail table, the probability of a match is 
1. If none appear (the detail table is empty), then the probability is 0.

On the other if more foreign keys exist than primary keys, then some of the 
foreign keys won't find a match on an inner join, so:

{noformat}
p(match) = ndv(m.pk) / ndv(d.fk)
{noformat}

If all primary keys appear in the detail table, the probability of a match is 
1. If there are twice as many foreign keys as primary keys, the probability of 
a match is 50%.

Combining the two cases:

{noformat}
p(match) = ndv(d.fk) / ndv(m.pk) if ndv(d.fk) < ndv(m.pk),
           ndv(m.pk) / ndv(d.fk) otherwise.
{noformat}

(As it turns out, the above logic is used in the non-PK/FK, case in the Planner 
code. But, as we'll see, half is missing in the FK/PK code.)

Adding this to the join cardinality estimate we get:

{noformat}
|d >< m| = |scan(d)| * p(pk) * p(match)
         = |scan(d)| * |scan(m)| / |m| * ndv(d.fk) / ndv(m.pk)
{noformat}

On the other hand, if we assume more foreign keys than primary keys, we get:

{noformat}
|d >< m| = |scan(d)| * p(pk) * p(match)
         = |scan(d)| * |scan(m)| / |m| * ndv(m.pk) / ndv(d.fk)
{noformat}

h4. Bug in the Code

Compare the above derived formula to the one in the code:

{noformat}
cardinality = |scan(d)| * (|scan(m)| / |m|) * (NDV(m.pk) / NDV(d.fk))
{noformat}

We can check how the code implements the formula:

{code:java}
      ndvRatio = slots.rhsNdv() / slots.lhsNdv();
      double rhsSelectivity = rhsCard / slots.rhsNumRows();
      long joinCard = (long) Math.ceil(lhsCard * rhsSelectivity * ndvRatio);
{code}

So, the comment accurately describes the code.

But, it contains a severe error. Consider a normal 1:M case: the master-detail 
relationship such as customers and orders. Not all customers have an order, but 
all orders must have been placed by customers. As a result, the normal use is 
fewer foreign keys than primary keys. But, the formula above is for the 
{{ndv(d.fk) > ndv(f.pk)}} case. Why is this wrong?

Assume half of customers never placed an order. The ratio in the code is 
{{NDV(m.pk) / NDV(d.fk)}} or, say, 100 customers / 50 customers who placed an 
order or 2. That is, the fewer customer who have placed an order, the *greater* 
the assumed join cardinality.

The common case in data warehousing is a fact (detail) table and a dimension 
(master) table. Again, if the dimension table has more keys than the detail 
table uses, the cardinality estimate will over estimate.

When the ratio is low, the cardinality estimate may become so high that the 
planner rejects a valid join in favor of one that seems better, but is actually 
much worse at execution time.

The actual formula in the comments (with some annotations) is:

{noformat}
              |scan(L)|     |scan(R)|
              ----^-----    ----^-----
cardinality = |child(0)| * (|child(1)| / |R|) * (NDV(R.d) / NDV(L.c))

L = detail, L.c = d.fk
R = master, R.d = m.pk
{noformat}

To ensure we are reading all this correctly, let' work out a detailed example. 
Let's assume that our detail table has 10K rows, our master table 100 rows, and 
that only 25 of the master records are referenced. Let's assume the scans 
remove no rows. Using the formula derived above:

{noformat}
|join| = |scan(d)| * (|scan(m)| / |m|) * (ndv(d.fk) / ndv(m.pk))
       = |d| * (|m| / |m|) * (ndv(d.fk) / ndv(m.pk))
       = 10,000 * 1 * (25 / 100)
       = 10,000 / 4
       = 2,500
{noformat}

Using the formula from the code:

{noformat}
L = d
R = m
cardinality = |child(0)| * (|child(1)| / |R|) * (NDV(R.d) / NDV(L.c))
            = |scan(L)| * (|scan(R)| / |R|) * (NDV(R.d) / NDV(L.c))
            = |scan(d)| * (|scan(m)| / |m|) * (NDV(m.pk) / NDV(d.fk))
            = |d| * (|m| / |m|) * (NDV(m.pk) / NDV(d.fk)) 
            = 10,000 * 1 * (100 / 25)
            = 10,000 * 4
            = 40,000
{noformat}

As a result, the fewer of the primary keys are referenced, the more we expect 
the input rows to be replicated.

Let's check this empirically by printing out the `ndvRatio` when running some 
queries:

{noformat}
ndv ratio = 2.0977582147609786
ndv ratio = 3.3183831043488135
ndv ratio = 2.0977582147609786
ndv ratio = 2.0977582147609786
{noformat}

So, this seems like a real bug.

h4. Proposed Fix

The easiest fix is just to invert the ratio. But, running {{PlannerTest}} shows 
that we need to handle both cases:

{noformat}
ndv ratio = 1.5245451773554224
ndv ratio = 0.10545630784098954
ndv ratio = 0.959424038561171
ndv ratio = 335.3333333333333
{noformat}

If we inverted these as proposed, we'd see:

{noformat}
0.67
10
1.04
0.003
{noformat}

This means we have cases where {{ndv(f.pk) > ndv(m.pk)}} *and* {{ndv(f.pk) < 
ndv(m.pk)}}. Thus, we need to compute the probability using the two-part 
calculation shown above.

The proposed fix is to add that calculation to the code so that the ndv ratio 
in the snippet of {{PlannerTest}} above would be:

{noformat}
0.67
0.10
0.96
0.003
{noformat}

The probability is always less than 1.

As noted in the text above, the code already (attempts) to use this calculation 
in the non-PK/FK case (though that code also has bugs.)


> Incorrect M:1 (FK/PK) cardinality estimation
> --------------------------------------------
>
>                 Key: IMPALA-8014
>                 URL: https://issues.apache.org/jira/browse/IMPALA-8014
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Frontend
>    Affects Versions: Impala 3.1.0
>            Reporter: Paul Rogers
>            Assignee: Paul Rogers
>            Priority: Major
>
> The match is wrong when we estimate the cardinality of a join we've labeled 
> as "FK/PK" (commonly known as many-to-one or M:1.)
> This is a version of the notes found 
> [here|https://github.com/paul-rogers/impala/wiki/Planner].
> h4. Background
> Join cardinality is estimated in {{JoinNode.computeStats()}}, and especially 
> in {{getJoinCardinality()}}, which implements the infamous "FK/PK" 
> heuristics. "The FK/PK detection logic is based on the assumption that most 
> joins are FK/PK. ... In the absence of relevant stats, we assume FK/PK with a 
> join selectivity of 1." As noted above, M:1 is more common name for "FK/PK".
> Earlier logic ensured that the larger table is on the probe (left) side. So, 
> the code assumes that the L is the detail table with the foreign key (FK), 
> while the smaller build-side table is the master table with the primary key 
> (PK). That is L:R is M:1, joined on {{L.<FK> = R.<PK>}}.
> Next, let's translate the code's naming to the more traditional (and 
> descriptive, if somewhat offensive) RDBMS naming:
> * *master table:* the 1 side fo the M:1 relationship, has a unique primary 
> key (PK) typically enforced via a unique index. Referred to as the R (build) 
> table in Impala. We will use the symbol {{m}} for the master table, and 
> {{m.pk}} for the (presumed) primary key column.
> * *detail table:* the M side of the M:1 relationship, has a non-unique 
> foreign key (FK) that points to one entry in the master table. Referred to as 
> the L (probe) table in Impala. We will use the symbol {{d}} for the detail 
> table, and {{d.fk}} for the (presumed) foreign key column.
> And, we need some notation:
> * {{|x|}} is the cardinality of relation (table) {{x}}. The relation can be a 
> base table or the output of a prior join.
> * {{|scan\(x)|}} is the cardinality out of an Impala scan node after applying 
> filter predicates.
> * {{ndv\(x)}} is the NDV of column {{x}}.
> To paraphrase a comment in the code, cardinality estimation is:
> ```
> \|d >< m| = |scan(d)| * (|scan(m)| / |m|) * (ndv(m.pk) / ndv(d.fk))
> ```
> Where:
> * {{|scan\(d)|}} is the cardinality of the detail scan after applying any 
> predicates to the table cardinality.
> * {{|scan\(m)|}} is the same for the master scan.
> * {{|m|}} is the cardinality of the master table itself.
> * {{ndv(m.pk)}} is the NDV of the presumed primary key column in the master 
> (right, build, smaller) table.
> * {{ndv(d.fk)}} is the NDV of the presumed foreign key column in the detail 
> (left, probe, larger) table.
> h4. Deriving the Formula
> Let's tear this apart.
> In RDBMS, a Primary Key is a unique column (or, more typically, set of 
> columns) that are generally indexed via a unique index, ensuring uniqueness. 
> Since keys are unique:
> {noformat}
> ndv(m.pk) = |m|
> {noformat}
> Then since there is a M:1 relationship between the detail (d) and master (m) 
> tables:
> {noformat}
> ndv(d.fk) <= |m| << |d|
> {noformat}
> If we read all rows from both tables, and all primary keys appear as foreign 
> keys, the the join cardinality is simply:
> {noformat}
> |d >< m| = |d|
> {noformat}
> If we filter out some detail rows, then cardinality is:
> {noformat}
> |d >< m| = |scan(d)|
> {noformat}
> If we filter out rows in the master table, and we assume uniform distribution 
> of foreign keys, then the probability of any foreign key finding a primary 
> key match is:
> {noformat}
> p(pk) = |scan(m)| / |m|
> {noformat}
> The result varies is a probabilty of a given pk presenting to the join that 
> runs from 0 (no rows match the predicates) to 1 (all rows present).
> The revised join cardinality is:
> {noformat}
> |d >< m| = |scan(d)| * p(pk)
>            |scan(d)| * |scan(m)| / |m|
> {noformat}
> There is some built-in selectivity: perhaps not all the primary keys are 
> actually referenced. In fact, we can figure out how many actually are used. 
> Lets assume that not all primary keys appear as foreign keys in the detail 
> table. Then, the probability of a match is:
> {noformat}
> p(match) = ndv(d.fk) / ndv(m.pk)
> {noformat}
> If all primary keys appear in the detail table, the probability of a match is 
> 1. If none appear (the detail table is empty), then the probability is 0.
> On the other if more foreign keys exist than primary keys, then some of the 
> foreign keys won't find a match on an inner join, so:
> {noformat}
> p(match) = ndv(m.pk) / ndv(d.fk)
> {noformat}
> If all primary keys appear in the detail table, the probability of a match is 
> 1. If there are twice as many foreign keys as primary keys, the probability 
> of a match is 50%.
> Combining the two cases:
> {noformat}
> p(match) = ndv(d.fk) / ndv(m.pk) if ndv(d.fk) < ndv(m.pk),
>            ndv(m.pk) / ndv(d.fk) otherwise.
> {noformat}
> (As it turns out, the above logic is used in the non-PK/FK, case in the 
> Planner code. But, as we'll see, half is missing in the FK/PK code.)
> Adding this to the join cardinality estimate for the normal case of fewer 
> FK's than PK's we get:
> {noformat}
> |d >< m| = |scan(d)| * p(pk) * p(match)
>          = |scan(d)| * |scan(m)| / |m| * ndv(d.fk) / ndv(m.pk)
> {noformat}
> On the other hand, if we assume more foreign keys than primary keys, we get:
> {noformat}
> |d >< m| = |scan(d)| * p(pk) * p(match)
>          = |scan(d)| * |scan(m)| / |m| * ndv(m.pk) / ndv(d.fk)
> {noformat}
> h4. Bug in the Code
> Compare the above derived formula to the one in the code:
> {noformat}
> cardinality = |scan(d)| * (|scan(m)| / |m|) * (NDV(m.pk) / NDV(d.fk))
> {noformat}
> We can check how the code implements the formula:
> {code:java}
>       ndvRatio = slots.rhsNdv() / slots.lhsNdv();
>       double rhsSelectivity = rhsCard / slots.rhsNumRows();
>       long joinCard = (long) Math.ceil(lhsCard * rhsSelectivity * ndvRatio);
> {code}
> So, the comment accurately describes the code.
> But, it contains a severe error. Consider a normal 1:M case: the 
> master-detail relationship such as customers and orders. Not all customers 
> have an order, but all orders must have been placed by customers. As a 
> result, the normal use is fewer foreign keys than primary keys. But, the 
> formula above is for the {{ndv(d.fk) > ndv(f.pk)}} case. Why is this wrong?
> Assume half of customers never placed an order. The ratio in the code is 
> {{NDV(m.pk) / NDV(d.fk)}} or, say, 100 customers / 50 customers who placed an 
> order or 2. That is, the fewer customer who have placed an order, the 
> *greater* the assumed join cardinality.
> The common case in data warehousing is a fact (detail) table and a dimension 
> (master) table. Again, if the dimension table has more keys than the detail 
> table uses, the cardinality estimate will over estimate.
> When the ratio is low, the cardinality estimate may become so high that the 
> planner rejects a valid join in favor of one that seems better, but is 
> actually much worse at execution time.
> The actual formula in the comments (with some annotations) is:
> {noformat}
>               |scan(L)|     |scan(R)|
>               ----^-----    ----^-----
> cardinality = |child(0)| * (|child(1)| / |R|) * (NDV(R.d) / NDV(L.c))
> L = detail, L.c = d.fk
> R = master, R.d = m.pk
> {noformat}
> To ensure we are reading all this correctly, let' work out a detailed 
> example. Let's assume that our detail table has 10K rows, our master table 
> 100 rows, and that only 25 of the master records are referenced. Let's assume 
> the scans remove no rows. Using the formula derived above:
> {noformat}
> |join| = |scan(d)| * (|scan(m)| / |m|) * (ndv(d.fk) / ndv(m.pk))
>        = |d| * (|m| / |m|) * (ndv(d.fk) / ndv(m.pk))
>        = 10,000 * 1 * (25 / 100)
>        = 10,000 / 4
>        = 2,500
> {noformat}
> Using the formula from the code:
> {noformat}
> L = d
> R = m
> cardinality = |child(0)| * (|child(1)| / |R|) * (NDV(R.d) / NDV(L.c))
>             = |scan(L)| * (|scan(R)| / |R|) * (NDV(R.d) / NDV(L.c))
>             = |scan(d)| * (|scan(m)| / |m|) * (NDV(m.pk) / NDV(d.fk))
>             = |d| * (|m| / |m|) * (NDV(m.pk) / NDV(d.fk)) 
>             = 10,000 * 1 * (100 / 25)
>             = 10,000 * 4
>             = 40,000
> {noformat}
> As a result, the fewer of the primary keys are referenced, the more we expect 
> the input rows to be replicated.
> Let's check this empirically by printing out the `ndvRatio` when running some 
> queries:
> {noformat}
> ndv ratio = 2.0977582147609786
> ndv ratio = 3.3183831043488135
> ndv ratio = 2.0977582147609786
> ndv ratio = 2.0977582147609786
> {noformat}
> So, this seems like a real bug.
> h4. Proposed Fix
> The easiest fix is just to invert the ratio. But, running {{PlannerTest}} 
> shows that we need to handle both cases:
> {noformat}
> ndv ratio = 1.5245451773554224
> ndv ratio = 0.10545630784098954
> ndv ratio = 0.959424038561171
> ndv ratio = 335.3333333333333
> {noformat}
> If we inverted these as proposed, we'd see:
> {noformat}
> 0.67
> 10
> 1.04
> 0.003
> {noformat}
> This means we have cases where {{ndv(f.pk) > ndv(m.pk)}} *and* {{ndv(f.pk) < 
> ndv(m.pk)}}. Thus, we need to compute the probability using the two-part 
> calculation shown above.
> The proposed fix is to add that calculation to the code so that the ndv ratio 
> in the snippet of {{PlannerTest}} above would be:
> {noformat}
> 0.67
> 0.10
> 0.96
> 0.003
> {noformat}
> The probability is always less than 1.
> As noted in the text above, the code already (attempts) to use this 
> calculation in the non-PK/FK case (though that code also has bugs.)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to