[ 
https://issues.apache.org/jira/browse/HIVE-16125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16357920#comment-16357920
 ] 

slim bouguerra edited comment on HIVE-16125 at 2/14/18 2:28 AM:
----------------------------------------------------------------

To fix this, added a new table property that the user can use as an extra 
hashing salt to split further the reduce sink.

For instance, during the create statement use can add the property
{code:java}
"druid.segment.targetShardsPerGranularity"="6"{code}
to add some random keys between 0 and 5, thus per segment granularity will have 
up to 6 reducers. 

FYI still unsure about the insert statements if such benefit will occur as 
well.  The user has to make sure when using this feature to choose wisely the 
target number of shards per segment granularity. If the number is too high the 
segments will be too small. If the number is too high the segments will be 
huge. Further improvement can be using statistics or add an extra shuffle 
reduce stage that counts and partition the rows according to some partition 
size. 


was (Author: bslim):
To fix this, added a new table property that the user can use as an extra 
hashing salt to split further the reduce sink.

For instance, during the create statement use can add the property
{code:java}
"druid.segment.targetShardPerGranularity"="6"{code}
to add some random keys between 0 and 5, thus per segment granularity will have 
up to 6 reducers. 

FYI still unsure about the insert statements if such benefit will occur as 
well.  The user has to make sure when using this feature to choose wisely the 
target number of shards per segment granularity. If the number is too high the 
segments will be too small. If the number is too high the segments will be 
huge. Further improvement can be using statistics or add an extra shuffle 
reduce stage that counts and partition the rows according to some partition 
size. 

> Split work between reducers.
> ----------------------------
>
>                 Key: HIVE-16125
>                 URL: https://issues.apache.org/jira/browse/HIVE-16125
>             Project: Hive
>          Issue Type: Bug
>          Components: Druid integration
>            Reporter: slim bouguerra
>            Assignee: slim bouguerra
>            Priority: Major
>         Attachments: HIVE-16125.4.patch, HIVE-16125.5.patch, 
> HIVE-16125.6.patch, HIVE-16125.patch
>
>
> Split work between reducer.
> currently we have one reducer per segment granularity even if the interval 
> will be partitioned over multiple partitions.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to