[jira] [Commented] (DRILL-4188) Change the default value of planner.enable_hash_single_key to false

Aman Sinha (JIRA) Fri, 11 Dec 2015 15:11:40 -0800

    [ 
https://issues.apache.org/jira/browse/DRILL-4188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15053758#comment-15053758
 ]


Aman Sinha commented on DRILL-4188:
-----------------------------------

Only if we can get the NDV stats for the columns, right ?  That would take some 
time whereas this issue (severe skew) seems to be occurring in some user 
deployments. 

> Change the default value of planner.enable_hash_single_key to false
> -------------------------------------------------------------------
>
>                 Key: DRILL-4188
>                 URL: https://issues.apache.org/jira/browse/DRILL-4188
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Query Planning & Optimization
>    Affects Versions: 1.4.0
>            Reporter: Aman Sinha
>            Assignee: Aman Sinha
>
> The planner.enable_hash_single_key flag is used by the HashJoin and MergeJoin 
> plans to do hash distribution on both sides of the join when it is a 
> multi-column join (e.g T1.a1 = T2.a2 AND T1.b1 = T2.b2).   The default value 
> of this parameter is True, which means that Drill will generate multiple 
> plans each with hash distribute on only 1 column.  The final plan chosen is 
> based on costing.  
> However, due to lack of column statistics, this approach is problematic 
> because we could end up picking the first column for hash distribution if all 
> plans cost the same and if this column has low number of distinct values, 
> there could be substantial skew in distribution.  
> Doing the hash distribution on all columns should be the default, so I 
> propose to change planner.enable_hash_single_key to False.  The scenario 
> where we might still want single column hash distribution is when the join is 
> done after some other operation (e.g window function, grouped-aggregation) 
> where the child already does a hash-distribution on 1 column that is part of 
> the join.  However, for those case, we may want to selectively enable this 
> flag. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-4188) Change the default value of planner.enable_hash_single_key to false

Reply via email to