[ 
https://issues.apache.org/jira/browse/DRILL-2402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Venki Korukanti resolved DRILL-2402.
------------------------------------
          Resolution: Fixed
       Fix Version/s:     (was: 0.9.0)
                      0.8.0
    Target Version/s: 0.8.0

Fixed in 
[bb1d761|https://github.com/apache/drill/commit/bb1d7615e7eb6c0c17c0c8a1cde0ca070393e257].

> Current method of combining hash values can produce skew
> --------------------------------------------------------
>
>                 Key: DRILL-2402
>                 URL: https://issues.apache.org/jira/browse/DRILL-2402
>             Project: Apache Drill
>          Issue Type: Improvement
>          Components: Functions - Drill
>    Affects Versions: 0.8.0
>            Reporter: Aman Sinha
>            Assignee: Jacques Nadeau
>             Fix For: 0.8.0
>
>         Attachments: DRILL-2402-1.patch
>
>
> The current method of combining hash values of multiple columns can produce 
> skew in some cases even though each individual hash function does not produce 
> skew.  The combining function is XOR: 
> {code}
>    hash(a, b) = XOR (hash(a), hash(b))
> {code}
> The above result will be 0 for all  rows where a = b, so hash(a) = hash(b).  
> This will clearly create severe skew and affects the performance of queries 
> that do HashAggregate based group-by on {a, b} or a HashJoin .on both columns.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to