[ 
https://issues.apache.org/jira/browse/IMPALA-8213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Rogers reassigned IMPALA-8213:
-----------------------------------

    Assignee:     (was: Paul Rogers)

> Planner does not adjust join cardinality for correlated filters
> ---------------------------------------------------------------
>
>                 Key: IMPALA-8213
>                 URL: https://issues.apache.org/jira/browse/IMPALA-8213
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Frontend
>    Affects Versions: Impala 3.1.0
>            Reporter: Paul Rogers
>            Priority: Major
>
> The planner does not consider column join-equivalence when working out join 
> cardinality on scans with filtering, leading to unrealistically low estimates.
> Here is a snippet from a real query:
> {noformat}
> WHERE ...
>    table_a.part_id = 123 
> OR table_a.part_id = 234
> {noformat}
> The query has four other tables in addition to table_a. The WHERE clause 
> equates the {{part_id}} column in all of them (in addition to other FK/PK 
> pairs.)
> The planner correct works out join equivalence: it notices, via the join 
> expressions, that the {{part_id}} all take the same values. The planner then 
> pushes the above predicate into all the scans:
> {noformat}
> |  |  04:SCAN HDFS [table_i, RANDOM]                                     
> |  |     partition predicates: table_i.part_id = CAST(123 AS BIGINT) OR 
> table_i.part_id = CAST(234 AS BIGINT)
> 06:SCAN HDFS [table_u, RANDOM]                                                
>   
>    partition predicates: table_u.part_id = CAST(123 AS BIGINT) OR 
> table_u.part_id = CAST(234 AS BIGINT)
> {noformat}
> So far, so good. The problem is that the join cardinality calculations does 
> not realize that all tables are filtered (in part) on the same columns. That 
> is, the join calculations assume that each table is filtered independently, 
> when in fact the filtering is strongly correlated.
> h4. Problem Detail
> To see this, imagine three tables, A, B, and C. They are all have a part_id 
> column. A has 10K rows, B and C have 1K rows. Let A be a fact table, and B 
> and C be dimension tables. We create a simplified form of the DB query:
> {code:sql}
> SELECT count(*) FROM A, B, C
> WHERE A.part_id = 123
>   AND A.part_id = B.part_id
>   AND A.b_id = B.id
>   AND A.part_id = C.part_id
>   AND A.c_id = C.id
> {code}
> Assume that B(part_id, id) and C(part_id, id) are primary keys. Assume 
> {{|B.part_id| = 10}} and {{|C.part_id| = 10}}. (That is, there are 10 
> partitions, each with 100 unique {{id}} values, so that {{|part_id| * |id| = 
> |B|}}, etc.
> If we use the (corrected) FK/PK join calculations we get:
> {noformat}
> |A’| = |A| * sel(part_id = 123) = |A| * (1 / NDV(part_id)) = |A| / 10 = 10000 
> / 10 = 1000
> |B’| = as above, but for B
> |C’| = as above, but for C
> |A’ >< B’| = |A’| * |B’| / |B| = 1000 * 100 / 1000 = 100
> |(A’ >< B’) >< C’| = |A’ >< B’| * |C’| / |C| = 100 * 1000 / 100 = 10
> {noformat}
> Is this the correct answer? In general, if the filtering of tables A an B 
> were independent, then it would be correct. But, here, the filtering is 
> correlated. We want to account for that filtering once, not twice. We then 
> compound the error on the next join by again assuming filtering is 
> independent. By the second join, we are off by a factor of 100.
> This issue is discussed in Swami & Schiefer, [On the Estimation of Join 
> Result 
> Sizes|https://pdfs.semanticscholar.org/2735/672262940a23cfce8d4cc1e25c0191de7da7.pdf]
>  (S&S), which makes this useful observation to simplify our analysis: “For 
> the estimation of the result size, it does not matter in which order the 
> predicates are applied.” (That is, we get the same answer if we filter then 
> join, or join then filter — this is why query optimizers are complex: they 
> must consider all such options.)
> Using this idea here, let’s join first, then filter second:
> {noformat}
> |A >< B| = |A| * |B| / |B.pk| = 10,000 * 1000 / 1000 = 10,000
> |(A >< B) >< C| = |A >< B| * |C| / |C.pk| = 10,000 * 1000 / 1000 = 10,000
> sel(part_id = 123) = 0.1
> |((A >< B) >< C)’| = |(A >< B) >< C| * sel(part_id = 123) = 10,000 * 0.1 = 
> 1000
> {noformat}
> Here {{B.pk}} is shorthand for the compound primary key {{(part_id, id}}. 
> Being a primary key, we know {{|B.pk| == |B|}}.
> We join the fact table A with two dimension table B and C, with no filtering. 
> Intuitively, we end up with all the fact rows “enriched” with columns from B 
> and C. Then we apply filtering on the common {{part_id}} column to get 1/10 
> of the rows or 1000.
> h4. Backing Out Duplicated Filters
> The calculations above use only multiplication and division and thus are 
> commutative. (The fancy way of repeating the observation that order does not 
> matter.) So, we can use a hybrid solution: apply filtering at each scan, then 
> back out redundant filtering at the join level. Suppose we have a predicate 
> {{f}} (for filter) applied to both tables. The selectivity of predicate {{f}} 
> is {{sel(f)}}. Then:
> {noformat}
> |A'| = |A| * sel(f)
> |B'| = |B| * sel(f)
> {noformat}
> Now we can see the problem with the incorrect rule that Impala uses today:
> {noformat}
> |A' >< B'| = |A'| * |B'| / |B|
>            = |A| * sel(f) * |B| * sel(f) / |B|
>            = |A| * sel(f) * sel(f)
> {noformat}
> Notice that we've applied the filter twice rather than once as per our 
> earlier example. To correct this, we have to remove one of the applications 
> of the filter. That is, we need a correction, dividing out one of the 
> {{sel(f)}} terms:
> {noformat}
> |A' >< B'| = |A'| * |B'| / |B| / sel(f)
>            = |A| * sel(f) * |B| * sel(f) / |B| / sel(f)
>            = ( |A| |B| / |B| )  * sel(f)
>            = |A| * sel(f)
> {noformat}
> Intuitively, both the master and detail tables are filtered by {{f}}. The 
> number of detail rows is {{|A| * sel(f)}}. Since the filter is applied to 
> both tables, using the principal of containment, each filtered detail row 
> from A still finds a matching master row in (the filtered subset of) A.
> Note that, in practice, each side will have additional non-correlated 
> filters, so the real-world case is not as simple as the above case.
> h4. Proposed Solution
> The solution is straightforward: when the same filter is applied on both 
> sides of a join, remove the filter from the final join number. To do this, we 
> must track which predicates are common and which occur only one one side or 
> the other. We can see this by extending the above overly simple query with an 
> additional predicate on the B table:
> {noformat}
> WHERE ...
>   B.state = 'CA'
> {noformat}
> Now, the A and B tables are filtered by part_id as before, but B is 
> additional filtered by state. We have to account for this.
> In the join, consider all filters applied on the inputs:
> * {{part_id = 123} - on both A' and B'
> * {{state = 'CA'}} - on only B'.
> Back out from the RHS cardinality the selectivity of any duplicated predicate:
> {noformat}
> rhs_adj = ∏ sel(shared predicate i)
> |RHS''| = |RHS'| / rhs_adj
> |LHS' >< RHS'| = |LHS'| * |RHS''| / |RHS|
>                = |LHS'| * |RHS'| / |RHS| / rhs_adj
> {noformat}
> Fortunately, most of the pieces needed exist in the code; we just need to 
> wire them up to allow the above computation:
> * Create set PR as the set of all predicates that were previously "assigned" 
> to the RHS.
> * Similarly, create set PL for the LHS.
> * The LHS side can be a join, so for each join, track all predicates assigned 
> to either side of the join, or to the join itself.
> * Compute the set CP, the set of common predicates, as {{CP = PR ⋂ PL}}: the 
> intersection of the two sets.
> With that, we can compute {{rhs_adj}} from above.
> h4. Existing Code
> As shown in IMPALA-8014, the code appears to attempt to do the above 
> adjustment. But, the code performs the adjustment in all cases, not just in 
> the case of redundant predicates. The result turns out to not work in either 
> case, unfortunately.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to