[jira] [Updated] (STORM-1014) Use Hive Streaming API bucket info to bucket correctly

Rick Kellogg (JIRA) Mon, 28 Sep 2015 19:39:04 -0700

     [ 
https://issues.apache.org/jira/browse/STORM-1014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Rick Kellogg updated STORM-1014:
--------------------------------
    Component/s: storm-hive

> Use Hive Streaming API bucket info to bucket correctly
> ------------------------------------------------------
>
>                 Key: STORM-1014
>                 URL: https://issues.apache.org/jira/browse/STORM-1014
>             Project: Apache Storm
>          Issue Type: Improvement
>          Components: storm-hive
>            Reporter: Raj Bains
>            Assignee: Sriharsha Chintalapani
>            Priority: Critical
>
> The Storm bolt get a random bucket and writes data to it. Hive has 
> expectation that rows (tuples for storm) are distributed across buckets using 
> Hive's hash distribution. Writing to a random bucket by Storm leads to Hive 
> optimizations that rely on bucketing to return incorrect results.
> The solution is for Storm Hive Bolt to use Hive bucket distribution 
> information and put the rows/tuples in the correct buckets. This relies on 
> Hive-11672. 
> This might require a shuffle within Storm. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (STORM-1014) Use Hive Streaming API bucket info to bucket correctly

Reply via email to