Thanks, I ended up writing a scala program which uses the hive JDBC connector. Performance was still reasonable.

@tommychheng
Programmer and UC Irvine Graduate Student
Find a great grad school based on research interests: http://gradschoolnow.com


On 9/27/10 11:13 PM, Guru Prasad wrote:
Hi,
Please see the attachment.......this might help you.
It helped me  for solving similar kind of problem.


Thanks & Regards
~guru prasad

On 09/28/2010 06:20 AM, Tommy Chheng wrote:
I have two tables:
pages( title, domain, url )
top_domains(domain)

top_domains was created from a group by domain operation on the pages table.


Because the pages table is very large, I only want to be able to sample 5 rows 
for each domain in top_domains.

in a traditional programming language, i could just use a for loop to iterate 
on the domain field and perform a select with a limit 5 clause.
Is there a way to express this query in hive?
-
@tommychheng
Programmer and UC Irvine Graduate Student
Find a great grad school based on research interests:http://gradschoolnow.com



This message is intended only for the use of the addressee and may contain 
information that is privileged, confidential
and exempt from disclosure under applicable law. If the reader of this message 
is not the intended recipient, or the
employee or agent responsible for delivering the message to the intended 
recipient, you are hereby notified that any
dissemination, distribution or copying of this communication is strictly 
prohibited. If you have received this e-mail
in error, please notify us immediately by return e-mail and delete this e-mail 
and all attachments from your system.

Reply via email to