Paul Rogers created IMPALA-8219:
-----------------------------------
Summary: Pick among join plans based on table size, not just
cardinality
Key: IMPALA-8219
URL: https://issues.apache.org/jira/browse/IMPALA-8219
Project: IMPALA
Issue Type: Improvement
Components: Frontend
Affects Versions: Impala 3.1.0
Reporter: Paul Rogers
Assignee: Paul Rogers
The code in {{SingleNodePlanner}} currently picks among join candidates by
considering only cardinality of the joins:
{code:java}
if (newRoot == null
|| (candidate.getClass().equals(newRoot.getClass())
&& candidate.getCardinality() < newRoot.getCardinality())
|| ...) {
newRoot = candidate;
minEntry = entry;
}
{code}
If we have a start-schema, then a single fact table will join to potentially
many fact tables. By definition, there is a M:1 relationship between the fact
and dimension tables, so all joins will produce the same join cardinality. In
this case, we should consider hash table size, not just join cardinality. That
way, the smallest rows will be joined first, with the largest rows latest. This
avoids sending the larger rows over the network and prefers to send the
smallest across the most hops.
Somewhere in the code is a comment that claims we do try to minimize hash table
size, but it perhaps refers to entries, not entry size.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)