I have a search solution that is down stream of some Netezza data marts that I'm replacing with a Hive solution. We already partition the data for the search solution 32 ways and I would like to take advantage of the data clustering in Hive (buckets), so that I don't have to do any post processing. Is there documentation that describes how the data is hashed or how it's organized across the buckets? Or could someone point me to a class that implements it? Thanks!
Aaron
