Aman Sinha created DRILL-2402:
---------------------------------
Summary: Current method of combining hash values can produce skew
Key: DRILL-2402
URL: https://issues.apache.org/jira/browse/DRILL-2402
Project: Apache Drill
Issue Type: Improvement
Components: Functions - Drill
Affects Versions: 0.8.0
Reporter: Aman Sinha
Assignee: Jacques Nadeau
The current method of combining hash values of multiple columns can produce
skew in some cases even though each individual hash function does not produce
skew. The combining function is XOR:
{code}
hash(a, b) = XOR (hash(a), hash(b))
{code}
The above result will be 0 for all rows where a = b, so hash(a) = hash(b).
This will clearly create severe skew and affects the performance of queries
that do HashAggregate based group-by on {a, b} or a HashJoin .on both columns.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)