[jira] [Resolved] (IMPALA-8034) PlannerTest cardinality tests are not realistic

Paul Rogers (JIRA) Wed, 06 Feb 2019 17:11:25 -0800


     [ 
https://issues.apache.org/jira/browse/IMPALA-8034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Paul Rogers resolved IMPALA-8034.
---------------------------------
    Resolution: Fixed

> PlannerTest cardinality tests are not realistic
> -----------------------------------------------
>
>                 Key: IMPALA-8034
>                 URL: https://issues.apache.org/jira/browse/IMPALA-8034
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Frontend
>    Affects Versions: Impala 3.1.0
>            Reporter: Paul Rogers
>            Assignee: Paul Rogers
>            Priority: Minor
>
> Impala generally assumes that queries are M:1, joined on the FK/PK. A PK 
> uniquely identifies a row, so {{|pl1| = |Table|}}. This assumption is build 
> into join estimation: that columns are independent, so if we have multiple 
> keys, {{|pk1| * |pk2| * … * |pkn| = |Table|}}.
> But, PlannerTest frequently uses non-independent, non unique columns. For 
> example, it might join on both the (unique) {{id}} column and the non-unique 
> {{int_col}} column, which throws off calculations. For example:
> {noformat}
> select *
> from functional.alltypesagg a
> full outer join functional.alltypessmall b using (id, int_col)
> right join functional.alltypesaggnonulls c on (a.id = c.id and b.string_col = 
> c.string_col)
> {noformat}
> If we then try to get the estimated cardinalities to match the actual 
> cardinalities obtained from running the query, we end up fighting our 
> assumptions. This shows up in the code: rather than use the classic 
> assumption that the key columns are independent, the code uses special 
> adjustments for redundant columns, perhaps so that tests such as the above 
> produce good estimates.
> Better to modify (or add) tests that are based on our assumptions so we can 
> verify that the intended logic works. It is fine to then add a few “oddball” 
> queries to see how well the estimates hold up when the data (or user) does 
> not follow the independence assumption.
> Alternatively, add new tests that use realistic joins, and retain the 
> existing tests, adding a note of explanation why the resulting cardinality 
> estimates appear wrong (because we are using unrealistic, redundant columns 
> in joins, which real users seldom do.)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Resolved] (IMPALA-8034) PlannerTest cardinality tests are not realistic

Reply via email to