Also something Else that seems work is to specify all my conditions for conversions in the ON clause such as:

        LEFT OUTER JOIN
          conversions ON
            communications.account = conversions.account AND
            communications.application = conversions.application AND
            communications.reference_id = conversions.reference_id AND
            conversions.hour >= 343013 AND
            conversions.hour < 343014

This appears to also filter the tables before the join is performed, does this work how I think it is working?

Josh Ferguson

On Feb 16, 2009, at 10:28 PM, Zheng Shao wrote:

The following will also work. In the case of left outer join, the value of columns from the missing table will be NULL, so checking whether the value IS NULL should solve the problem.

SELECT communications.sender_id, conversions.actor_id, communications.version, conversions.reference_id FROM communications LEFT OUTER JOIN conversions ON communications.reference_id = conversions.reference_id WHERE communications.hour >= 343013 AND communications.hour < 343014 AND ( conversions.application = 'reference' OR conversions.application IS NULL);

I am not sure whether SQL standard says something different from what Hive implements. It will be great if you could point us to some references.




Zheng

On Mon, Feb 16, 2009 at 10:21 PM, Josh Ferguson <j...@besquared.net> wrote:
Will this work if this select statement is already nested in a FROM?

Josh Ferguson

On Feb 16, 2009, at 10:14 PM, Prasad Chakka wrote:

Try below,

SELECT a.sender_id,
             b.actor_id,
             a.version,
             b.reference_id
FROM communications a
LEFT OUTER JOIN (select conversions.actor_id, conversions.reference_id from conversions where conversions.application='reference') b
            ON a.reference_id = b.reference_id
WHERE a.hour >= 343013
               AND a.hour < 343014


From: Josh Ferguson <j...@besquared.net>
Reply-To: <hive-user@hadoop.apache.org>
Date: Mon, 16 Feb 2009 22:04:14 -0800
To: <hive-user@hadoop.apache.org>
Subject: Yet another join issue

Ok so this one I just ran into. It appears when doing a join of two
tables that, when using a LEFT OUTER JOIN and specifying a WHERE
condition on both tables, the resulting join is an INNER JOIN. Example
follows:

LEFT OUTER JOIN with WHERE condition on the LEFT table

SELECT communications.sender_id, conversions.actor_id,
communications.version, conversions.reference_id FROM communications
LEFT OUTER JOIN conversions ON communications.reference_id =
conversions.reference_id WHERE communications.hour >= 343013 AND
communications.hour < 343014

1       NULL    A       NULL
1       2       A       2
2       NULL    A       NULL
3       NULL    B       NULL

correct results, as expected

LEFT OUTER JOIN with WHERE conditions on both tables

SELECT communications.sender_id, conversions.actor_id,
communications.version, conversions.reference_id FROM communications
LEFT OUTER JOIN conversions ON communications.reference_id =
conversions.reference_id WHERE communications.hour >= 343013 AND
communications.hour < 343014 AND conversions.application = 'reference'

1       2       A       2

incorrect results, it simply displays the row that was matched in both
tables.

Any ideas what's going on here? Is this intended? How can I get a LEFT
OUTER JOIN and put conditions on both tables?

Josh Ferguson





--
Yours,
Zheng

Reply via email to