Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change 
notification.

The following page has been changed by AdamKramer:
http://wiki.apache.org/hadoop/Hive/LanguageManual/Joins

------------------------------------------------------------------------------
  {{{
    SELECT a.val, b.val, c.val FROM a JOIN b ON (a.key = b.key1) JOIN c ON 
(c.key = b.key2)
  }}}
-   is a valid join
+   is a valid join.
   * Hive converts joins over multiple tables into a single map/reduce job if 
for every table the same column is used in the join clauses e.g.
  {{{
    SELECT a.val, b.val, c.val FROM a JOIN b ON (a.key = b.key1) JOIN c ON 
(c.key = b.key1)
@@ -66, +66 @@

    SELECT a.val, b.val, c.val FROM a JOIN b ON (a.key = b.key1) JOIN c ON 
(c.key = b.key2)
  }}}
    there are two map/reduce jobs involved in computing the join. The first of 
these joins a with b and buffers the values of a while streaming the values of 
b in the reducers. The second of one of these jobs buffers the results of the 
first join while streaming the values of c through the reducers.
+  * LEFT, RIGHT, and FULL OUTER joins exist in order to provide more control 
over ON clauses for which there is no match. For example, this query:
+ {{{
+   SELECT a.val, b.val FROM a LEFT OUTER JOIN b ON (a.key=b.key)
+ }}}
+   will return a row for every row in a. This output row will be a.val,b.val 
when there is a b.key that equals a.key, and the output row will be a.val,NULL 
when there is no corresponding b.key. Rows from b which have no corresponding 
a.key will be dropped. The syntax "FROM a LEFT OUTER JOIN b" must be written on 
one line in order to understand how it works--a is to the LEFT of b in this 
query, and so all rows from a are kept; a RIGHT OUTER JOIN will keep all rows 
from b, and a FULL OUTER JOIN will keep all rows from a and all rows from b. 
OUTER JOIN semantics should conform to standard SQL specs.
   * Joins occur BEFORE WHERE CLAUSES. So, if you want to restrict the OUTPUT 
of a join, a requirement should be in the WHERE clause, otherwise it should be 
in the JOIN clause. A big point of confusion for this issue is partitioned 
tables:
  {{{
    SELECT a.val, b.val FROM a LEFT OUTER JOIN b ON (a.key=b.key)

Reply via email to