If you write a similar UDF which takes the partitioning column (isbn in your example) as input and restarts the sequence at 0 whenever it sees a new value in this column, then you can combine it with
ORDER BY isbn,price in a subselect to get the result you want. Whether or not this is good enough depends on the amount of data on which the ranking filter is being applied (since ORDER BY currently forces everything to go through a single reducer). JVS On May 26, 2010, at 1:36 PM, Kortni Smith wrote: Thank Josh, If I could use the LIMIT clause I’d like to, but since I need to impose a limit on each group (max 10 results PER isbn’s for example), and hive does not support subqueries in the where clause, I’m at a loss for how to implement this. Any examples or further thoughts on this? We’re currently evaluating whether to use pig or hive – and this is one thing that was easier to implement with pig. Am I overlooking a simple approach? Thanks again for any assistance on this. Kortni From: John Sichi [mailto:[email protected]] Sent: Tuesday, May 25, 2010 12:21 PM To: [email protected]<mailto:[email protected]> Subject: Re: rownum, row_number() or looping ability with hiveql? In your simple example, you can probably use the LIMIT clause, but for more advanced cases, here's a patch for a ROW_SEQUENCE UDF (not committed to trunk yet): https://issues.apache.org/jira/browse/HIVE-1304 The caveat is that since we don't actually have a SQL/OLAP implementation yet, you have to use ORDER BY at the nested query level (rather than the OVER clause level where it belongs) and cross your fingers. JVS On May 25, 2010, at 12:13 PM, Kortni Smith wrote: Hi, Is there a hive equivalent to Oracle’s rownum, row_number() or the ability to loop through a resultset? I have been struggling to create a hive query that will give me max X records, per something, when sorted by something. For example, I have book data, multiple records for any given isbn, and want the lowest 5 priced books per isbn. I can accomplish this in oracle with the following: select isbn, price from ( select isbn, price, row_number() over (partition by isbn order by price asc) rn from kstest ) where rn <= 5; Any ideas would be greatly appreciated. Thank you, Kortni Smith | Software Developer AbeBooks.com <http://www.abebooks.com/> Passion for books. [email protected]<mailto:[email protected]> phone: 250.412.3272 | fax: 250.475.6014 Suite 500 - 655 Tyee Rd. Victoria, BC. Canada V9A 6X5 www.abebooks.com <x-msg://24/www.abebooks.com > | www.abebooks.co.uk<x-msg://24/www.abebooks.co.uk> | www.abebooks.de<x-msg://24/www.abebooks.de> www.abebooks.fr <x-msg://24/www.abebooks.fr > | www.abebooks.it<x-msg://24/www.abebooks.it> | www.iberlibro.com<x-msg://24/www.iberlibro.com>
