Try below,

SELECT a.sender_id,
             b.actor_id,
             a.version,
             b.reference_id
FROM communications a
            LEFT OUTER JOIN (select conversions.actor_id, 
conversions.reference_id from conversions where 
conversions.application='reference')   b
            ON a.reference_id = b.reference_id
WHERE a.hour >= 343013
               AND a.hour < 343014


________________________________
From: Josh Ferguson <j...@besquared.net>
Reply-To: <hive-user@hadoop.apache.org>
Date: Mon, 16 Feb 2009 22:04:14 -0800
To: <hive-user@hadoop.apache.org>
Subject: Yet another join issue

Ok so this one I just ran into. It appears when doing a join of two
tables that, when using a LEFT OUTER JOIN and specifying a WHERE
condition on both tables, the resulting join is an INNER JOIN. Example
follows:

LEFT OUTER JOIN with WHERE condition on the LEFT table

SELECT communications.sender_id, conversions.actor_id,
communications.version, conversions.reference_id FROM communications
LEFT OUTER JOIN conversions ON communications.reference_id =
conversions.reference_id WHERE communications.hour >= 343013 AND
communications.hour < 343014

1       NULL    A       NULL
1       2       A       2
2       NULL    A       NULL
3       NULL    B       NULL

correct results, as expected

LEFT OUTER JOIN with WHERE conditions on both tables

SELECT communications.sender_id, conversions.actor_id,
communications.version, conversions.reference_id FROM communications
LEFT OUTER JOIN conversions ON communications.reference_id =
conversions.reference_id WHERE communications.hour >= 343013 AND
communications.hour < 343014 AND conversions.application = 'reference'

1       2       A       2

incorrect results, it simply displays the row that was matched in both
tables.

Any ideas what's going on here? Is this intended? How can I get a LEFT
OUTER JOIN and put conditions on both tables?

Josh Ferguson

Reply via email to