> Is there any other way to approach this problem? If I can ensure that a > particular user's (sorted) data is guaranteed to be processed on a single > Hadoop node, then probably I can write a custom script to do the ranking for > me.
I guess the answer to my query is given at http://wiki.apache.org/hadoop/Hive/LanguageManual/SortBy -- "Hive uses the columns in *Distribute By* to distribute the rows among reducers. All rows with the same *Distribute By* columns will go to the same reducer. Instead of specifying *Cluster By*, the user can specify *Distribute By* and *Sort By*, so the partition columns and sort columns can be different. The usual case is that the partition columns are a prefix of sort columns, but that is not required." Saurabh. -- http://nandz.blogspot.com http://foodieforlife.blogspot.com
