Thanks, I ended up writing a scala program which uses the hive JDBC
connector. Performance was still reasonable.
@tommychheng
Programmer and UC Irvine Graduate Student
Find a great grad school based on research interests: http://gradschoolnow.com
On 9/27/10 11:13 PM, Guru Prasad wrote:
Hi,
Please see the attachment.......this might help you.
It helped me for solving similar kind of problem.
Thanks & Regards
~guru prasad
On 09/28/2010 06:20 AM, Tommy Chheng wrote:
I have two tables:
pages( title, domain, url )
top_domains(domain)
top_domains was created from a group by domain operation on the pages table.
Because the pages table is very large, I only want to be able to sample 5 rows
for each domain in top_domains.
in a traditional programming language, i could just use a for loop to iterate
on the domain field and perform a select with a limit 5 clause.
Is there a way to express this query in hive?
-
@tommychheng
Programmer and UC Irvine Graduate Student
Find a great grad school based on research interests:http://gradschoolnow.com
This message is intended only for the use of the addressee and may contain
information that is privileged, confidential
and exempt from disclosure under applicable law. If the reader of this message
is not the intended recipient, or the
employee or agent responsible for delivering the message to the intended
recipient, you are hereby notified that any
dissemination, distribution or copying of this communication is strictly
prohibited. If you have received this e-mail
in error, please notify us immediately by return e-mail and delete this e-mail
and all attachments from your system.