Hi everyone,
I am going to do some interesting things for join in
hive. Before I read the source code, could anyone tell me what kinds of
join have been implemented in the newest version of hive?

Right now, what I have known are:
1. symmetric join has been implemented, which is the default join.
2.
asymmetric join, a.k.s. map-side join (for joining a huge table and a
small table and only use the map phase), has been implemented.. But no
optimization was added. If so, what I think is when we meet two huge
tables, we can use semi-join to first get rid of the non-referenced
tuples in one tables making it smaller, and then do the map-side join.
3.
3-way join (only use one map-reduce job to join 3 tables) was
implemented, but only applied for joining on the same join key (A.k=B.k
&& B.k=C.k). If we want to join 3 tables on different join keys
(A.k1=B..k1 & B.k2=C.k2), we still need 2 map-reduce jobs.
4.
when joining two tables, hive could tell whether the join key is a
partitioned column, and make good use of this partition feature.
5. no sort-merge join was implemented in hive right now, thus we cannot do the 
in-equi join. 

There
may be many mistakes in my understanding. Please point it out or give
me further information about join in hive. Thanks so much.


 Luo, Gang
---------
Department of Computer Science
Duke University
(919)316-0993
[email protected]



      ___________________________________________________________ 
  好玩贺卡等你发,邮箱贺卡全新上线! 
http://card.mail.cn.yahoo.com/

Reply via email to