Paul Rogers created IMPALA-8034:
-----------------------------------

             Summary: PlannerTest tests are not realistic
                 Key: IMPALA-8034
                 URL: https://issues.apache.org/jira/browse/IMPALA-8034
             Project: IMPALA
          Issue Type: Bug
          Components: Frontend
    Affects Versions: Impala 3.1.0
            Reporter: Paul Rogers
            Assignee: Paul Rogers


Impala generally assumes that queries are M:1, joined on the FK/PK. A PK 
uniquely identifies a row, so {{|pl1| = |Table|}}. This assumption is build 
into join estimation: that columns are independent, so if we have multiple 
keys, {{|pk1| * |pk2| * … * |pkn| = |Table|}}.

But, PlannerTest frequently uses non-independent, non unique columns. For 
example, it might join on both the (unique) {{id}} column and the non-unique 
{{int_col}} column, which throws off calculations. For example:

{noformat}
select *
from functional.alltypesagg a
full outer join functional.alltypessmall b using (id, int_col)
right join functional.alltypesaggnonulls c on (a.id = c.id and b.string_col = 
c.string_col)
{noformat}

If we then try to get the estimated cardinalities to match the actual 
cardinalities obtained from running the query, we end up fighting our 
assumptions. This shows up in the code: rather than use the classic assumption 
that the key columns are independent, the code uses special adjustments for 
redundant columns, perhaps so that tests such as the above produce good 
estimates.

Better to modify (or add) tests that are based on our assumptions so we can 
verify that the intended logic works. It is fine to then add a few “oddball” 
queries to see how well the estimates hold up when the data (or user) does not 
follow the independence assumption.

Alternatively, add new tests that use realistic joins, and retain the existing 
tests, adding a note of explanation why the resulting cardinality estimates 
appear wrong (because we are using unrealistic, redundant columns in joins, 
which real users seldom do.)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to