[
https://issues.apache.org/jira/browse/IMPALA-8014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Armstrong reassigned IMPALA-8014:
-------------------------------------
Assignee: (was: Tim Armstrong)
> Incorrect M:1 (FK/PK) cardinality estimation
> --------------------------------------------
>
> Key: IMPALA-8014
> URL: https://issues.apache.org/jira/browse/IMPALA-8014
> Project: IMPALA
> Issue Type: Bug
> Components: Frontend
> Affects Versions: Impala 3.1.0
> Reporter: Paul Rogers
> Priority: Major
>
> The math is wrong when we estimate the cardinality of a join we've labeled as
> "FK/PK" (commonly known as many-to-one or M:1.)
> h4. Join Estimation
> The standard calculation for joins is explained in Swami & Schiefer, [On the
> Estimation of Join Result
> Sizes|https://pdfs.semanticscholar.org/2735/672262940a23cfce8d4cc1e25c0191de7da7.pdf]
> (S&S). Note especially section 3, Background. Also see [How Good Are Query
> Optimizers, Really?|http://www.vldb.org/pvldb/vol9/p204-leis.pdf] by Leis et
> al.
> {noformat}
> |L'| * |R'|
> |L' ⋈ R'| = ------------------
> max(|L.k'|, |R.k'|)
> {noformat}
> Here:
> * {{|x|}} is the row cardinality if x is a relation, the NDV (domain
> cardinality) if x is a column.
> * {{L}}, {{R}} are left and right input relations.
> * {{L.k}} and {{R.k}} are the (possibly compound) join keys.
> * {{x’}} is the result of applying a filter to relation or column {{x}}.
> Intuitively, the cardinality of the join is the Cartesian product reduced by
> the largest column domain. Since the relations fed into the join are
> filtered, we are concerned with the new, filtered relations created by the
> scan nodes.
> Impala, like most planners, makes three basic assumptions (see the first
> paper above):
> * Uniformity: keys are evenly distributed.
> * Independence: the filtering on the two tables is independent of keys.
> * Containment: that all detail (FK) rows have a corresponding master (FK) row.
> (These are traditional assumptions, but they turn out to be unrealistic in
> many cases; see the second paper above.)
> h4. M:1 Case
> The planner (unnecessarily) divides join planning into two cases.
> (IMPALA-8018 describes how the two steps are unnecessary. But, to minimize
> code change, we live with this process here.)
> * M:1 (AKA "FK/PK", "detail/master", "fact/dimension"). A "foreign key" (FK)
> in the detail table matches at most one "primary key" (PK) in the master
> table. ("At most one" because of filtering which may have removed master
> keys.) Impala assumes that the detail ("FK") table is on the left (probe)
> side, and the master ("PK") table is on the right ("build") side.
> * M:N (AKA "generic", "many-to-many"). Every key on the left matches
> potentially many keys on the right.
> If we focus on just the M:1 case we can rename the relations for convenience
> and observe a simplification:
> {noformat}
> L = D
> R = M
> L.k = M.pk
> R.k = D.fk
> {noformat}
> By definition of M:1:
> {noformat}
> |M.pk'| = |M'|
> {noformat}
> Which gives a revised join expression:
> {noformat}
> |D'| * |M'|
> |D' ⋈ M'| = ------------------
> max(|D.fk'|, |M'|)
> {noformat}
> The expression above is handy: it shows that we only need three values to
> compute the join cardinality:
> * {{|D'|}} the output cardinality from the left plan node, which is already
> available.
> * {{|M'|}} the output cardinality from the right scan node, which is already
> available. (In Impala, the right node will always be a table scan.)
> * {{|D.fk'|}} the number of foreign key values in the filtered {{D'}}
> relation. (In relational theory terms, the cardinality of the domain of the
> foreign key after applying pushed-down selection operations.)
> See IMPALA-XXXX for a necessary adjustment to {{|M'|}} to handle predicates
> common to both sides.
> We only need to estimate {{|D.fk'|}}. We will work up to this step by step
> because the intermediate steps help explain the bug in the current code.
> h4. Filtering of Master Table Only
> Let's start with the simplest case, filtering on only the master table:
> {noformat}
> |D.fk'| = |D.fk| <= |M.pk|
> max(|D.fk'|, |M'|) = max(|M.pk|, |M'|) = |M.pk| = |M|
> {noformat}
> So:
> {noformat}
> |D ⋈ M'| = |D| * |M'| / |M|
> {noformat}
> Intuitively: the probability of any detail row finding a match is simply the
> selectivity of the master filter, or {{|M'| / |M|}}.
> h4. Filtering on the Foreign Key Column
> The next case is also fairly easy. Suppose we know that the left (detail)
> scan applied a filter. Impala presently uses an incorrect, but simplified,
> model for this case. (See IMPALA-XXXX for the correct model.)
> The cardinality of the foreign key is simply the result of applying that
> filter:
> {noformat}
> |D.fk'| = |D.fk| * sel(f)
> {noformat}
> Let us assume we do not filter the master table in this case, so:
> {noformat}
> |D.fk'| < |D.fk| <= |M.pk|
> max(|D.fk'|, |M|) = |M|
> {noformat}
> And:
> {noformat}
> |D' ⋈ M| = |D'| * |M| / |M| = |D'|
> {noformat}
> We assume "containment" from the S&S paper above: all the foreign keys have a
> matching primary key if {{|F.fk| <= |M.pk|}}. So, if we filter only on the
> detail table, all foreign keys find a match and the cardinality of the join
> is just the cardinality of the left input relation.
> h4. Combined Left and Right Filtering
> Now, let's combine the master and detail filtering cases. In general, we will
> have filtering on both sides. The {{max}} in the expression automatically
> combines the cases:
> {noformat}
> |D.fk'| = |D.fk| * |D'| / |D|
> |D'| * |M'|
> |D' ⋈ M'| = ------------------
> max(|D.fk'|, |M'|)
> {noformat}
> Intuitively, the number of rows is the Cartesian product divided by the
> larger of the number of keys in either table after the scan. Said another
> way, each detail row finds a master, unless filtering has removed so many
> masters that some detail rows find no match, in which case the probability of
> a match is {{|M'| / |D.fk'|}}.
> h4. Compound Primary Keys
> The final complexity is to consider a compound key. That is:
> {noformat}
> (D.fk1, Dfk2) --> (M.pk1, M.pk2)
> {noformat}
> The foreign key pair points to a matching primary key pair. Here we consider
> only pairs, but the logic is the same for a compound key with any number of
> columns. Obviously, by the definition of a join in SQL, the number of keys on
> each side must be the same.
> If we know (from HMS metadata) that the above pairs are, in fact, the keys,
> then we can make a simplifying assumption:
> {noformat}
> |(M.pk1, M.pk2)| = |M|
> |(D.fk1, Dfk2)| <= |(M.pk1, M.pk2)| = |M|
> {noformat}
> If we don't know a-priori that a pair (k1, k1) is a foreign or primary key,
> then we can estimate its cardinality (assuming independence of values) as:
> {noformat}
> |(k1, k2)| = |k1| * |k2|
> {noformat}
> In the M:1 case, the primary key cardinality must be the same as table
> cardinality. If the two columns are completely independent, then:
> {noformat}
> |(M.pk1, M.pk2)| = |M.pk1| * |M.pk2| = |M|
> {noformat}
> More typically, there is some correlation between the columns so:
> {noformat}
> |(M.pk1, M.pk2)| = |M| <= |M.pk1| * |M.pk2|
> {noformat}
> Indeed, Impala uses the above relation to decide we have the FK/PK case. Said
> another way, if Impala follows the FK/PK logic, then we can simply assume
> that {{|M.pk| = |M|}} even if the key is compound.
> h4. Compound Foreign Keys
> Foreign keys require a bit more thought. We could have a detail table with
> only one row. If we have many detail rows, the assumption of containment says
> that each detail record points to some master record, so:
> {noformat}
> |D.fk| <= |M.pk|
> {noformat}
> This lets us estimate the cardinality of a compound foreign key as:
> {noformat}
> |(D.fk1, D.fk2)| = min( |D.fk1| * |D.fk2|, |M| )
> {noformat}
> That is, if the detail table is small, the left term is a reasonable estimate
> (ignoring the urn model issue). But, as the table gets larger, and begins to
> include most primary keys, we know that the number of foreign keys can't be
> larger than the number of primary keys, so the right term is the better
> estimate.
> h4. Filtering on Compound Keys
> Suppose that the join inputs have filtering applied to the left (detail)
> table. We discussed how to handle this fo a single column. For a compound
> key, we can observe that the cardinality of the key is the product of the
> cardinality of the columns, but (using the containment assumption), no larger
> than the cardinality of the primary key (which is the cardinality of the
> master table.)
> So:
> {noformat}
> |D.fk| = min( ∏ |D.fki|, |M| )
>
> |D.fk'| = |D.fk| * |D'| / |D|
> = min( ∏ |D.fki|, |M| ) * |D'| / |D|
> {noformat}
> So the final expression for join cardinality is:
> {noformat}
> |D.fk'| = min( ∏ |D.fki|, |M| ) * |D'| / |D|
> |D'| * |M'|
> |D' ⋈ M'| = -------------------
> max( |D.fk'|, |M'|)
> {noformat}
> The first expression says that the compound foreign key cardinality is either
> the product of the columns that make up the key, or the cardinality of the
> primary key, whichever is less. We then adjust that amount (incorrectly) by
> the percentage of the detail table that is scanned.
> The second expression is just the Cartesian product divided by the largest
> key cardinality: either foreign key or primary key (which is, by definition,
> equal to the cardinality of the master table.)
> h4. Complexity: Compound Joins
> The above expression works well if the left input to a join is a base table
> row which we know the original table cardinality that corresponds to the
> original column NDV. The above can be used as-is in a M:N ("generic") join
> for a right-side table. But, when when used on the left side, we must recall
> that Impala builds left-deep join plans, so the left side may be a join. In
> this case, there is no original left-side base table.
> See IMPALA-XXXX for discussion of the complexities in this case. A quick and
> dirty solution is to use the scan output cardinality in place of the left
> input cardinality. That is:
> * Determine the table that contains the join key.
> * Search the left subtree for the scan for that table.
> To do this, start with the left input:
> * If the node is a scan node, and the scan is for the target table, return
> the output cardinality of that scan.
> * If the node is a join, then apply the search to *both* sides of the join.
> This approach is cumbersome, and may run into complexities if the node is
> something other than a scan or join. It also may underestimate if a filter is
> applied at the join level.
> A cleaner, tough more involved, solution is to track adjusted NDV for each
> column through each operator as described in IMPALA-8220.
> h4. Code Bug
> The commit "IMPALA-5547: Rework FK/PK join detection", ID
> [{{9f678a74269250bf5c7ae2c5e8afd93c5b3734de}}|https://github.com/apache/impala/commit/9f678a74269250bf5c7ae2c5e8afd93c5b3734de#diff-b10ccc2cdf68b236be400cde1e858a7c]
> on Jun 6, 2017 reworked the FK/PK logic. It has one flaw: after determining
> that we have an FK/PK (M:1) case, it then attempts to adjust the compound FK
> and PK columns. This has two problems:
> * It is unnecessary, as we saw above.
> * The math is wrong and produces bogus estimates.
> Here is the code in {{JoinNode.java}}:
> {code:java}
> long result = -1;
> for (EqJoinConjunctScanSlots slots: eqJoinConjunctSlots) {
> // Adjust the join selectivity based on the NDV ratio to avoid
> underestimating
> // the cardinality if the PK side has a higher NDV than the FK side.
> double ndvRatio = 1.0;
> if (slots.lhsNdv() > 0) ndvRatio = slots.rhsNdv() / slots.lhsNdv();
> double rhsSelectivity = Double.MIN_VALUE;
> if (slots.rhsNumRows() > 0) rhsSelectivity = rhsCard /
> slots.rhsNumRows();
> long joinCard = (long) Math.ceil(lhsCard * rhsSelectivity * ndvRatio);
> if (result == -1) {
> result = joinCard;
> } else {
> result = Math.min(result, joinCard);
> }
> }
> // FK/PK join cardinality must be <= the lhs cardinality.
> result = Math.min(result, lhsCard);
> {code}
> The above is hard to follow, which may account for why the bug was not
> caught. Ignoring some corner cases, the logic is essentially:
> {noformat}
> lhsCard = |D'|
> ndvRatioi = |D.fki| / |M.pki|
> rhsSelectivityi = |M'| / |M|
> joinCardi = lhsCard * ndvRatioi * rhsSelectivityi
> = |D'| * (|D.fki| / |M.pki|) * |M'| / |M|
> |join| = min(joinCardi)
> = (|D'| * |M'| / |M|) * min(|D.fki| / |M.pki|)
> {noformat}
> Though hard to see, the above is not equivalent to the logic worked out in
> the previous section. Using the correct expression from earlier sections:
> {noformat}
> |D'| * |M'|
> |D' ⋈ M'| = ----------------------------------------------
> max(min( ∏ |D.fki|, |M| ) * |D'| / |D|), |M'|)
> {noformat}
> We can factor out the common {{|D'| * |M'|}} terms and compare:
> {noformat}
> min(|D.fki| / |M.pki|) 1
> ---------------------- != ----------------------------------------------
> |M| max(min( ∏ |D.fki|, |M| ) * |D'| / |D|), |M'|)
> 1 1
> ---------------------------- !=
> ----------------------------------------------
> |M| * max(|M.pki| / |D.fki|) max(min( ∏ |D.fki|, |M| ) * |D'| / |D|),
> |M'|)
> |M| * max(|M.pki| / |D.fki|) != max(min( ∏ |D.fki|, |M| ) * |D'| / |D|),
> |M'|)
> {noformat}
> There is no valid operation that can convert one side to the other, so they
> are unequal.
> It is likely that the code's version attempts to work around issues elsewhere
> in the calculations (such as ignoring some predicates, using exponential
> back-off for filters, not having a good estimate for {{|D|}}, etc.)
> h4. Longer-Term Fix
> The above simple fix is the target of this ticket. Longer term, the code
> should evolve to use a single path for both the M:1 and M:N cases since as
> described in IMPALA-8018. (Both cases start with HMS data. Currently we use
> two paths to arrive at the same result. IMPALA-8018 suggests we need only one
> path.) We should also adopt the simple urn model as described in IMPALA-8218.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]