Try below, SELECT a.sender_id, b.actor_id, a.version, b.reference_id FROM communications a LEFT OUTER JOIN (select conversions.actor_id, conversions.reference_id from conversions where conversions.application='reference') b ON a.reference_id = b.reference_id WHERE a.hour >= 343013 AND a.hour < 343014
________________________________ From: Josh Ferguson <j...@besquared.net> Reply-To: <hive-user@hadoop.apache.org> Date: Mon, 16 Feb 2009 22:04:14 -0800 To: <hive-user@hadoop.apache.org> Subject: Yet another join issue Ok so this one I just ran into. It appears when doing a join of two tables that, when using a LEFT OUTER JOIN and specifying a WHERE condition on both tables, the resulting join is an INNER JOIN. Example follows: LEFT OUTER JOIN with WHERE condition on the LEFT table SELECT communications.sender_id, conversions.actor_id, communications.version, conversions.reference_id FROM communications LEFT OUTER JOIN conversions ON communications.reference_id = conversions.reference_id WHERE communications.hour >= 343013 AND communications.hour < 343014 1 NULL A NULL 1 2 A 2 2 NULL A NULL 3 NULL B NULL correct results, as expected LEFT OUTER JOIN with WHERE conditions on both tables SELECT communications.sender_id, conversions.actor_id, communications.version, conversions.reference_id FROM communications LEFT OUTER JOIN conversions ON communications.reference_id = conversions.reference_id WHERE communications.hour >= 343013 AND communications.hour < 343014 AND conversions.application = 'reference' 1 2 A 2 incorrect results, it simply displays the row that was matched in both tables. Any ideas what's going on here? Is this intended? How can I get a LEFT OUTER JOIN and put conditions on both tables? Josh Ferguson