Paul Rogers created IMPALA-8219:
-----------------------------------

             Summary: Pick among join plans based on table size, not just 
cardinality
                 Key: IMPALA-8219
                 URL: https://issues.apache.org/jira/browse/IMPALA-8219
             Project: IMPALA
          Issue Type: Improvement
          Components: Frontend
    Affects Versions: Impala 3.1.0
            Reporter: Paul Rogers
            Assignee: Paul Rogers


The code in {{SingleNodePlanner}} currently picks among join candidates by 
considering only cardinality of the joins:

{code:java}
           if (newRoot == null
              || (candidate.getClass().equals(newRoot.getClass())
                  && candidate.getCardinality() < newRoot.getCardinality())
              || ...) {
            newRoot = candidate;
            minEntry = entry;
          }
{code}

If we have a start-schema, then a single fact table will join to potentially 
many fact tables. By definition, there is a M:1 relationship between the fact 
and dimension tables, so all joins will produce the same join cardinality. In 
this case, we should consider hash table size, not just join cardinality. That 
way, the smallest rows will be joined first, with the largest rows latest. This 
avoids sending the larger rows over the network and prefers to send the 
smallest across the most hops.

Somewhere in the code is a comment that claims we do try to minimize hash table 
size, but it perhaps refers to entries, not entry size.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to