Mostly correct. 2. Your idea looks interesting but I would say in reality, the percentage of tuples purged may not be that large. 4. Hive does NOT treat the partition column differently than others. 5. There is no sort-merge join yet. This would be a great feature to add onto Hive!
Zheng 2009/10/25 Gang Luo <[email protected]> > Hi everyone, > I am going to do some interesting things for join in hive. Before I read > the source code, could anyone tell me what kinds of join have been > implemented in the newest version of hive? > > Right now, what I have known are: > 1. symmetric join has been implemented, which is the default join. > 2. asymmetric join, a.k.s. map-side join (for joining a huge table and a > small table and only use the map phase), has been implemented. But no > optimization was added. If so, what I think is when we meet two huge tables, > we can use semi-join to first get rid of the non-referenced tuples in one > tables making it smaller, and then do the map-side join. > 3. 3-way join (only use one map-reduce job to join 3 tables) was > implemented, but only applied for joining on the same join key (A.k=B.k && > B.k=C.k). If we want to join 3 tables on different join keys (A.k1=B.k1 & > B.k2=C.k2), we still need 2 map-reduce jobs. > 4. when joining two tables, hive could tell whether the join key is a > partitioned column, and make good use of this partition feature. > 5. no sort-merge join was implemented in hive right now, thus we cannot do > the in-equi join. > > There may be many mistakes in my understanding. Please point it out or give > me further information about join in hive. Thanks so much. > > > Luo, Gang > --------- > Department of Computer Science > Duke University > (919)316-0993 > [email protected] > > ------------------------------ > 好玩贺卡等你发,邮箱贺卡全新上线!<http://cn.rd.yahoo.com/mail_cn/tagline/card/*http://card.mail.cn.yahoo.com/> > -- Yours, Zheng
