Raj Bains created STORM-1014:
--------------------------------

             Summary: Use Hive Streaming API bucket info to bucket correctly
                 Key: STORM-1014
                 URL: https://issues.apache.org/jira/browse/STORM-1014
             Project: Apache Storm
          Issue Type: Improvement
            Reporter: Raj Bains
            Assignee: Sriharsha Chintalapani
            Priority: Critical


The Storm bolt get a random bucket and writes data to it. Hive has expectation 
that rows (tuples for storm) are distributed across buckets using Hive's hash 
distribution. Writing to a random bucket by Storm leads to Hive optimizations 
that rely on bucketing to return incorrect results.

The solution is for Storm Hive Bolt to use Hive bucket distribution information 
and put the rows/tuples in the correct buckets. This relies on Hive-11672. 

This might require a shuffle within Storm. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to