yes.

________________________________
From: Saurabh Nanda [mailto:[email protected]]
Sent: Sunday, July 19, 2009 11:38 PM
To: [email protected]
Subject: Re: dense_rank() equivalent in Hive?


Is there any other way to approach this problem? If I can ensure that a 
particular user's (sorted) data is guaranteed to be processed on a single 
Hadoop node, then probably I can write a custom script to do the ranking for me.

I guess the answer to my query is given at 
http://wiki.apache.org/hadoop/Hive/LanguageManual/SortBy --

"Hive uses the columns in Distribute By to distribute the rows among reducers. 
All rows with the same Distribute By columns will go to the same reducer. 
Instead of specifying Cluster By, the user can specify Distribute By and Sort 
By, so the partition columns and sort columns can be different. The usual case 
is that the partition columns are a prefix of sort columns, but that is not 
required."

Saurabh.

--

http://nandz.blogspot.com
http://foodieforlife.blogspot.com

Reply via email to