[
https://issues.apache.org/jira/browse/DRILL-2402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Venki Korukanti resolved DRILL-2402.
------------------------------------
Resolution: Fixed
Fix Version/s: (was: 0.9.0)
0.8.0
Target Version/s: 0.8.0
Fixed in
[bb1d761|https://github.com/apache/drill/commit/bb1d7615e7eb6c0c17c0c8a1cde0ca070393e257].
> Current method of combining hash values can produce skew
> --------------------------------------------------------
>
> Key: DRILL-2402
> URL: https://issues.apache.org/jira/browse/DRILL-2402
> Project: Apache Drill
> Issue Type: Improvement
> Components: Functions - Drill
> Affects Versions: 0.8.0
> Reporter: Aman Sinha
> Assignee: Jacques Nadeau
> Fix For: 0.8.0
>
> Attachments: DRILL-2402-1.patch
>
>
> The current method of combining hash values of multiple columns can produce
> skew in some cases even though each individual hash function does not produce
> skew. The combining function is XOR:
> {code}
> hash(a, b) = XOR (hash(a), hash(b))
> {code}
> The above result will be 0 for all rows where a = b, so hash(a) = hash(b).
> This will clearly create severe skew and affects the performance of queries
> that do HashAggregate based group-by on {a, b} or a HashJoin .on both columns.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)