> Is there any other way to approach this problem? If I can ensure that a
> particular user's (sorted) data is guaranteed to be processed on a single
> Hadoop node, then probably I can write a custom script to do the ranking for
> me.


I guess the answer to my query is given at
http://wiki.apache.org/hadoop/Hive/LanguageManual/SortBy --

"Hive uses the columns in *Distribute By* to distribute the rows among
reducers. All rows with the same *Distribute By* columns will go to the same
reducer. Instead of specifying *Cluster By*, the user can specify *Distribute
By* and *Sort By*, so the partition columns and sort columns can be
different. The usual case is that the partition columns are a prefix of sort
columns, but that is not required."

Saurabh.

-- 
http://nandz.blogspot.com
http://foodieforlife.blogspot.com

Reply via email to