Its as simple as taking a hashcode of the key and mod by number of reducers. To get started, have a try of any .q files in clientpositive directory.

On the code side, HiveKey.java has the implementation.



Sent from my iPhone

On Apr 11, 2010, at 2:48 PM, Aaron McCurry <amccu...@gmail.com> wrote:

I have a search solution that is down stream of some Netezza data marts that I'm replacing with a Hive solution. We already partition the data for the search solution 32 ways and I would like to take advantage of the data clustering in Hive (buckets), so that I don't have to do any post processing. Is there documentation that describes how the data is hashed or how it's organized across the buckets? Or could someone point me to a class that implements it? Thanks!

Aaron

Reply via email to