Thanks very much. But the reducer hangs with the warning WARN org.apache.hadoop.hive.ql.exec.JoinOperator: table 0 has more than joinEmitInterval rows for join key []
Both the tables are large and as Zheng mentions at http://www.mail-archive.com/[email protected]/msg00640.html, large size for table 0 is a problem. Is there any way to overcome this? Thanks, Rakesh ________________________________ From: Peter Skomoroch [mailto:[email protected]] Sent: Monday, June 29, 2009 4:20 PM To: [email protected] Subject: Re: Set difference in Hive Here is an example of what Amr mentioned from one of my Hive scripts, returns the set of pages not in "daily_pagecounts_table" select dt.page_id, dt.dates, dt.pageviews, dt.total_pageviews FROM daily_timelines dt LEFT OUTER JOIN daily_pagecounts_table dp ON (dt.page_id = dp.page_id) where dp.page_id is NULL On Mon, Jun 29, 2009 at 7:14 PM, Amr Awadallah <[email protected]<mailto:[email protected]>> wrote: do an outer join on user and filter on name.user is null -- amr Rakesh Setty wrote: Hi, I am new to Hive. I would like to know what is the easiest way to get the difference between two sets. For example, how can I convert the following SQL query to Hive? select user from page_views where user not in (select name from users); Thanks, Rakesh -- Peter N. Skomoroch 617.285.8348 http://www.datawrangling.com http://delicious.com/pskomoroch http://twitter.com/peteskomoroch
