Raj Bains created STORM-1014:
--------------------------------
Summary: Use Hive Streaming API bucket info to bucket correctly
Key: STORM-1014
URL: https://issues.apache.org/jira/browse/STORM-1014
Project: Apache Storm
Issue Type: Improvement
Reporter: Raj Bains
Assignee: Sriharsha Chintalapani
Priority: Critical
The Storm bolt get a random bucket and writes data to it. Hive has expectation
that rows (tuples for storm) are distributed across buckets using Hive's hash
distribution. Writing to a random bucket by Storm leads to Hive optimizations
that rely on bucketing to return incorrect results.
The solution is for Storm Hive Bolt to use Hive bucket distribution information
and put the rows/tuples in the correct buckets. This relies on Hive-11672.
This might require a shuffle within Storm.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)