[
https://issues.apache.org/jira/browse/IMPALA-8213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Paul Rogers reassigned IMPALA-8213:
-----------------------------------
Assignee: (was: Paul Rogers)
> Planner does not adjust join cardinality for correlated filters
> ---------------------------------------------------------------
>
> Key: IMPALA-8213
> URL: https://issues.apache.org/jira/browse/IMPALA-8213
> Project: IMPALA
> Issue Type: Bug
> Components: Frontend
> Affects Versions: Impala 3.1.0
> Reporter: Paul Rogers
> Priority: Major
>
> The planner does not consider column join-equivalence when working out join
> cardinality on scans with filtering, leading to unrealistically low estimates.
> Here is a snippet from a real query:
> {noformat}
> WHERE ...
> table_a.part_id = 123
> OR table_a.part_id = 234
> {noformat}
> The query has four other tables in addition to table_a. The WHERE clause
> equates the {{part_id}} column in all of them (in addition to other FK/PK
> pairs.)
> The planner correct works out join equivalence: it notices, via the join
> expressions, that the {{part_id}} all take the same values. The planner then
> pushes the above predicate into all the scans:
> {noformat}
> | | 04:SCAN HDFS [table_i, RANDOM]
> | | partition predicates: table_i.part_id = CAST(123 AS BIGINT) OR
> table_i.part_id = CAST(234 AS BIGINT)
> 06:SCAN HDFS [table_u, RANDOM]
>
> partition predicates: table_u.part_id = CAST(123 AS BIGINT) OR
> table_u.part_id = CAST(234 AS BIGINT)
> {noformat}
> So far, so good. The problem is that the join cardinality calculations does
> not realize that all tables are filtered (in part) on the same columns. That
> is, the join calculations assume that each table is filtered independently,
> when in fact the filtering is strongly correlated.
> h4. Problem Detail
> To see this, imagine three tables, A, B, and C. They are all have a part_id
> column. A has 10K rows, B and C have 1K rows. Let A be a fact table, and B
> and C be dimension tables. We create a simplified form of the DB query:
> {code:sql}
> SELECT count(*) FROM A, B, C
> WHERE A.part_id = 123
> AND A.part_id = B.part_id
> AND A.b_id = B.id
> AND A.part_id = C.part_id
> AND A.c_id = C.id
> {code}
> Assume that B(part_id, id) and C(part_id, id) are primary keys. Assume
> {{|B.part_id| = 10}} and {{|C.part_id| = 10}}. (That is, there are 10
> partitions, each with 100 unique {{id}} values, so that {{|part_id| * |id| =
> |B|}}, etc.
> If we use the (corrected) FK/PK join calculations we get:
> {noformat}
> |A’| = |A| * sel(part_id = 123) = |A| * (1 / NDV(part_id)) = |A| / 10 = 10000
> / 10 = 1000
> |B’| = as above, but for B
> |C’| = as above, but for C
> |A’ >< B’| = |A’| * |B’| / |B| = 1000 * 100 / 1000 = 100
> |(A’ >< B’) >< C’| = |A’ >< B’| * |C’| / |C| = 100 * 1000 / 100 = 10
> {noformat}
> Is this the correct answer? In general, if the filtering of tables A an B
> were independent, then it would be correct. But, here, the filtering is
> correlated. We want to account for that filtering once, not twice. We then
> compound the error on the next join by again assuming filtering is
> independent. By the second join, we are off by a factor of 100.
> This issue is discussed in Swami & Schiefer, [On the Estimation of Join
> Result
> Sizes|https://pdfs.semanticscholar.org/2735/672262940a23cfce8d4cc1e25c0191de7da7.pdf]
> (S&S), which makes this useful observation to simplify our analysis: “For
> the estimation of the result size, it does not matter in which order the
> predicates are applied.” (That is, we get the same answer if we filter then
> join, or join then filter — this is why query optimizers are complex: they
> must consider all such options.)
> Using this idea here, let’s join first, then filter second:
> {noformat}
> |A >< B| = |A| * |B| / |B.pk| = 10,000 * 1000 / 1000 = 10,000
> |(A >< B) >< C| = |A >< B| * |C| / |C.pk| = 10,000 * 1000 / 1000 = 10,000
> sel(part_id = 123) = 0.1
> |((A >< B) >< C)’| = |(A >< B) >< C| * sel(part_id = 123) = 10,000 * 0.1 =
> 1000
> {noformat}
> Here {{B.pk}} is shorthand for the compound primary key {{(part_id, id}}.
> Being a primary key, we know {{|B.pk| == |B|}}.
> We join the fact table A with two dimension table B and C, with no filtering.
> Intuitively, we end up with all the fact rows “enriched” with columns from B
> and C. Then we apply filtering on the common {{part_id}} column to get 1/10
> of the rows or 1000.
> h4. Backing Out Duplicated Filters
> The calculations above use only multiplication and division and thus are
> commutative. (The fancy way of repeating the observation that order does not
> matter.) So, we can use a hybrid solution: apply filtering at each scan, then
> back out redundant filtering at the join level. Suppose we have a predicate
> {{f}} (for filter) applied to both tables. The selectivity of predicate {{f}}
> is {{sel(f)}}. Then:
> {noformat}
> |A'| = |A| * sel(f)
> |B'| = |B| * sel(f)
> {noformat}
> Now we can see the problem with the incorrect rule that Impala uses today:
> {noformat}
> |A' >< B'| = |A'| * |B'| / |B|
> = |A| * sel(f) * |B| * sel(f) / |B|
> = |A| * sel(f) * sel(f)
> {noformat}
> Notice that we've applied the filter twice rather than once as per our
> earlier example. To correct this, we have to remove one of the applications
> of the filter. That is, we need a correction, dividing out one of the
> {{sel(f)}} terms:
> {noformat}
> |A' >< B'| = |A'| * |B'| / |B| / sel(f)
> = |A| * sel(f) * |B| * sel(f) / |B| / sel(f)
> = ( |A| |B| / |B| ) * sel(f)
> = |A| * sel(f)
> {noformat}
> Intuitively, both the master and detail tables are filtered by {{f}}. The
> number of detail rows is {{|A| * sel(f)}}. Since the filter is applied to
> both tables, using the principal of containment, each filtered detail row
> from A still finds a matching master row in (the filtered subset of) A.
> Note that, in practice, each side will have additional non-correlated
> filters, so the real-world case is not as simple as the above case.
> h4. Proposed Solution
> The solution is straightforward: when the same filter is applied on both
> sides of a join, remove the filter from the final join number. To do this, we
> must track which predicates are common and which occur only one one side or
> the other. We can see this by extending the above overly simple query with an
> additional predicate on the B table:
> {noformat}
> WHERE ...
> B.state = 'CA'
> {noformat}
> Now, the A and B tables are filtered by part_id as before, but B is
> additional filtered by state. We have to account for this.
> In the join, consider all filters applied on the inputs:
> * {{part_id = 123} - on both A' and B'
> * {{state = 'CA'}} - on only B'.
> Back out from the RHS cardinality the selectivity of any duplicated predicate:
> {noformat}
> rhs_adj = ∏ sel(shared predicate i)
> |RHS''| = |RHS'| / rhs_adj
> |LHS' >< RHS'| = |LHS'| * |RHS''| / |RHS|
> = |LHS'| * |RHS'| / |RHS| / rhs_adj
> {noformat}
> Fortunately, most of the pieces needed exist in the code; we just need to
> wire them up to allow the above computation:
> * Create set PR as the set of all predicates that were previously "assigned"
> to the RHS.
> * Similarly, create set PL for the LHS.
> * The LHS side can be a join, so for each join, track all predicates assigned
> to either side of the join, or to the join itself.
> * Compute the set CP, the set of common predicates, as {{CP = PR ⋂ PL}}: the
> intersection of the two sets.
> With that, we can compute {{rhs_adj}} from above.
> h4. Existing Code
> As shown in IMPALA-8014, the code appears to attempt to do the above
> adjustment. But, the code performs the adjustment in all cases, not just in
> the case of redundant predicates. The result turns out to not work in either
> case, unfortunately.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]