Thanks very much. But the reducer hangs with the warning WARN 
org.apache.hadoop.hive.ql.exec.JoinOperator: table 0 has more than 
joinEmitInterval rows for join key []

Both the tables are large and as Zheng mentions at 
http://www.mail-archive.com/[email protected]/msg00640.html, large 
size for table 0 is a problem. Is there any way to overcome this?



Thanks,

Rakesh

________________________________
From: Peter Skomoroch [mailto:[email protected]]
Sent: Monday, June 29, 2009 4:20 PM
To: [email protected]
Subject: Re: Set difference in Hive

Here is an example of what Amr mentioned from one of my Hive scripts, returns 
the set of pages not in "daily_pagecounts_table"

select dt.page_id, dt.dates, dt.pageviews, dt.total_pageviews
FROM daily_timelines dt LEFT OUTER JOIN daily_pagecounts_table dp ON 
(dt.page_id = dp.page_id)
where dp.page_id is NULL
On Mon, Jun 29, 2009 at 7:14 PM, Amr Awadallah 
<[email protected]<mailto:[email protected]>> wrote:

do an outer join on user and filter on name.user is null

-- amr


Rakesh Setty wrote:

Hi,



            I am new to Hive. I would like to know what is the easiest way to 
get the difference between two sets. For example, how can I convert the 
following SQL query to Hive?



select user from page_views where user not in (select name from users);



Thanks,

Rakesh



--
Peter N. Skomoroch
617.285.8348
http://www.datawrangling.com
http://delicious.com/pskomoroch
http://twitter.com/peteskomoroch

Reply via email to