Re: Cluster By Algorithm?

Zheng Shao Sun, 11 Apr 2010 14:17:03 -0700

Its as simple as taking a hashcode of the key and mod by number ofreducers. To get started, have a try of any .q files in clientpositivedirectory.


On the code side, HiveKey.java has the implementation.




Sent from my iPhone

On Apr 11, 2010, at 2:48 PM, Aaron McCurry <amccu...@gmail.com> wrote:

I have a search solution that is down stream of some Netezza datamarts that I'm replacing with a Hive solution. We already partitionthe data for the search solution 32 ways and I would like to takeadvantage of the data clustering in Hive (buckets), so that I don'thave to do any post processing. Is there documentation thatdescribes how the data is hashed or how it's organized across thebuckets? Or could someone point me to a class that implements it?Thanks!
Aaron

Re: Cluster By Algorithm?

Reply via email to