Its as simple as taking a hashcode of the key and mod by number of
reducers. To get started, have a try of any .q files in clientpositive
directory.
On the code side, HiveKey.java has the implementation.
Sent from my iPhone
On Apr 11, 2010, at 2:48 PM, Aaron McCurry <amccu...@gmail.com> wrote:
I have a search solution that is down stream of some Netezza data
marts that I'm replacing with a Hive solution. We already partition
the data for the search solution 32 ways and I would like to take
advantage of the data clustering in Hive (buckets), so that I don't
have to do any post processing. Is there documentation that
describes how the data is hashed or how it's organized across the
buckets? Or could someone point me to a class that implements it?
Thanks!
Aaron