[jira] [Commented] (DRILL-4119) Skew in hash distribution for varchar (and possibly other) types of data

Aman Sinha (JIRA) Sat, 21 Nov 2015 12:16:03 -0800

    [ 
https://issues.apache.org/jira/browse/DRILL-4119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15020665#comment-15020665
 ]


Aman Sinha commented on DRILL-4119:
-----------------------------------

Yes, it would be useful to have a suite for the hashing.  The number of 
combinations is large:  num_data_types x nullability x num_hash_function_types 
(32bit, 64bit, AsDouble variations).  Plus, the nature of the data itself - we 
need real world data for testing the quality of the distribution.  Let me see 
if I can at least have a minimal test suite with some sample of the above 
combinations.   I may end up creating a separate JIRA.

> Skew in hash distribution for varchar (and possibly other) types of data
> ------------------------------------------------------------------------
>
>                 Key: DRILL-4119
>                 URL: https://issues.apache.org/jira/browse/DRILL-4119
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Functions - Drill
>    Affects Versions: 1.3.0
>            Reporter: Aman Sinha
>            Assignee: Aman Sinha
>
> We are seeing substantial skew for an Id column that contains varchar data of 
> length 32.   It is easily reproducible by a group-by query: 
> {noformat}
> Explain plan for SELECT SomeId From table GROUP BY SomeId;
> ...
> 01-02          HashAgg(group=[{0}])
> 01-03            Project(SomeId=[$0])
> 01-04              HashToRandomExchange(dist0=[[$0]])
> 02-01                UnorderedMuxExchange
> 03-01                  Project(SomeId=[$0], 
> E_X_P_R_H_A_S_H_F_I_E_L_D=[castInt(hash64AsDouble($0))])
> 03-02                    HashAgg(group=[{0}])
> 03-03                      Project(SomeId=[$0])
> {noformat}
> The string id happens to be of the following type: 
> {noformat}
> e4b4388e8865819126cb0e4dcaa7261d
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-4119) Skew in hash distribution for varchar (and possibly other) types of data

Reply via email to