The thing I want the WHERE clause to do here is filter the two sets
*before* I join them not after. This is what Prasad's query was doing
unfortunately it wasn't working properly inside of a SELECT TRANSFORM
FROM(...) block. Anyone know the SQL standard involving WHERE clauses
and joins?
Josh Ferguson
On Feb 16, 2009, at 10:28 PM, Zheng Shao wrote:
The following will also work. In the case of left outer join, the
value of columns from the missing table will be NULL, so checking
whether the value IS NULL should solve the problem.
SELECT communications.sender_id, conversions.actor_id,
communications.version, conversions.reference_id FROM communications
LEFT OUTER JOIN conversions ON communications.reference_id =
conversions.reference_id WHERE communications.hour >= 343013 AND
communications.hour < 343014 AND ( conversions.application =
'reference' OR conversions.application IS NULL);
I am not sure whether SQL standard says something different from
what Hive implements. It will be great if you could point us to some
references.
Zheng
On Mon, Feb 16, 2009 at 10:21 PM, Josh Ferguson <j...@besquared.net>
wrote:
Will this work if this select statement is already nested in a FROM?
Josh Ferguson
On Feb 16, 2009, at 10:14 PM, Prasad Chakka wrote:
Try below,
SELECT a.sender_id,
b.actor_id,
a.version,
b.reference_id
FROM communications a
LEFT OUTER JOIN (select conversions.actor_id,
conversions.reference_id from conversions where
conversions.application='reference') b
ON a.reference_id = b.reference_id
WHERE a.hour >= 343013
AND a.hour < 343014
From: Josh Ferguson <j...@besquared.net>
Reply-To: <hive-user@hadoop.apache.org>
Date: Mon, 16 Feb 2009 22:04:14 -0800
To: <hive-user@hadoop.apache.org>
Subject: Yet another join issue
Ok so this one I just ran into. It appears when doing a join of two
tables that, when using a LEFT OUTER JOIN and specifying a WHERE
condition on both tables, the resulting join is an INNER JOIN.
Example
follows:
LEFT OUTER JOIN with WHERE condition on the LEFT table
SELECT communications.sender_id, conversions.actor_id,
communications.version, conversions.reference_id FROM communications
LEFT OUTER JOIN conversions ON communications.reference_id =
conversions.reference_id WHERE communications.hour >= 343013 AND
communications.hour < 343014
1 NULL A NULL
1 2 A 2
2 NULL A NULL
3 NULL B NULL
correct results, as expected
LEFT OUTER JOIN with WHERE conditions on both tables
SELECT communications.sender_id, conversions.actor_id,
communications.version, conversions.reference_id FROM communications
LEFT OUTER JOIN conversions ON communications.reference_id =
conversions.reference_id WHERE communications.hour >= 343013 AND
communications.hour < 343014 AND conversions.application =
'reference'
1 2 A 2
incorrect results, it simply displays the row that was matched in
both
tables.
Any ideas what's going on here? Is this intended? How can I get a
LEFT
OUTER JOIN and put conditions on both tables?
Josh Ferguson
--
Yours,
Zheng