If you write a similar UDF which takes the partitioning column (isbn in your 
example) as input and restarts the sequence at 0 whenever it sees a new value 
in this column, then you can combine it with

ORDER BY isbn,price

in a subselect to get the result you want.  Whether or not this is good enough 
depends on the amount of data on which the ranking filter is being applied 
(since ORDER BY currently forces everything to go through a single reducer).

JVS

On May 26, 2010, at 1:36 PM, Kortni Smith wrote:

Thank Josh, If I could use the LIMIT clause I’d like to, but since I need to 
impose a limit on each group (max 10 results PER isbn’s for example), and hive 
does not support subqueries in the where clause, I’m at a loss for how to 
implement this.  Any examples or further thoughts on this?

We’re currently evaluating whether to use pig or hive – and this is one thing 
that was easier to implement with pig.  Am I overlooking a simple approach?
Thanks again for any assistance on this.

Kortni

From: John Sichi [mailto:[email protected]]
Sent: Tuesday, May 25, 2010 12:21 PM
To: [email protected]<mailto:[email protected]>
Subject: Re: rownum, row_number() or looping ability with hiveql?

In your simple example, you can probably use the LIMIT clause, but for more 
advanced cases, here's a patch for a ROW_SEQUENCE UDF (not committed to trunk 
yet):

https://issues.apache.org/jira/browse/HIVE-1304

The caveat is that since we don't actually have a SQL/OLAP implementation yet, 
you have to use ORDER BY at the nested query level (rather than the OVER clause 
level where it belongs) and cross your fingers.

JVS

On May 25, 2010, at 12:13 PM, Kortni Smith wrote:


Hi,

Is there a hive equivalent to Oracle’s rownum, row_number() or the ability to 
loop through a resultset?

I have been struggling to create a hive query that will give me max X records, 
per something, when sorted by something.  For example, I have book data, 
multiple records for any given isbn, and want the lowest 5 priced books per 
isbn.

I can accomplish this in oracle with the following:

select isbn, price from
(
select isbn, price, row_number() over (partition by isbn order by price asc) rn
from kstest
)
where rn <= 5;

Any  ideas would be greatly appreciated.
Thank you,

Kortni Smith | Software Developer
AbeBooks.com  <http://www.abebooks.com/> Passion for books.

[email protected]<mailto:[email protected]>
phone: 250.412.3272  |  fax: 250.475.6014

Suite 500 - 655 Tyee Rd. Victoria, BC. Canada V9A 6X5

www.abebooks.com  <x-msg://24/www.abebooks.com  > |  
www.abebooks.co.uk<x-msg://24/www.abebooks.co.uk>  |  
www.abebooks.de<x-msg://24/www.abebooks.de>
www.abebooks.fr  <x-msg://24/www.abebooks.fr  > |  
www.abebooks.it<x-msg://24/www.abebooks.it>  |  
www.iberlibro.com<x-msg://24/www.iberlibro.com>



Reply via email to