Thanks you, Zheng and Ning. Luo, Gang --------- Department of Computer Science Duke University (919)316-0993 [email protected]
________________________________ 发件人: Ning Zhang <[email protected]> 收件人: "[email protected]" <[email protected]> 已发送: 周一, 10 26, 2009 3:07:49 上午 主 题: Re: join in hive Hive-870 is under implementation, and semijoin is designed to allow map-side join as well as at the reduce side. The idea of rewrite (reduce-side) large natural join into semijoin first and then map-side natural join may be better in some situations, but not always (remember semijoin here needs to be on the reduce side). The situation that a huge table join a very small portion of another huge table seems best fit into an indexed join scheme. Hive-417 is the project for supporting indexes in Hive. Thanks, Ning On Oct 25, 2009, at 11:03 PM, Namit Jain wrote: I wanted to add one more thing, > >>Ning has started working on semi-join. So, please co-ordinate with him on >>that. > > >http://issues.apache.org/jira/browse/HIVE-870 > > > > > >>On 10/25/09 9:04 PM, "Zheng Shao" <[email protected]> wrote: > > >Mostly correct. >> >>>>2. Your idea looks interesting but I would say in reality, the percentage >>>>of tuples purged may not be that large. >>>>4. Hive does NOT treat the partition column differently than others. >>>>5. There is no sort-merge join yet. This would be a great feature to add >>>>onto Hive! >> >>>>Zheng >> >>>>2009/10/25 Gang Luo <[email protected]> >> >>Hi everyone, >>>>>>I am going to do some interesting things for join in hive. Before I read >>>>>>the source code, could anyone tell me what kinds of join have been >>>>>>implemented in the newest version of hive? >>> >>>>>>Right now, what I have known are: >>>>>>1. symmetric join has been implemented, which is the default join. >>>>>>2. asymmetric join, a.k.s. map-side join (for joining a huge table and a >>>>>>small table and only use the map phase), has been implemented. But no >>>>>>optimization was added. If so, what I think is when we meet two huge >>>>>>tables, we can use semi-join to first get rid of the non-referenced >>>>>>tuples in one tables making it smaller, and then do the map-side join. >>>>>>3. 3-way join (only use one map-reduce job to join 3 tables) was >>>>>>implemented, but only applied for joining on the same join key (A.k=B.k >>>>>>&& B.k=C.k). If we want to join 3 tables on different join keys >>>>>>(A.k1=B..k1 & B.k2=C.k2), we still need 2 map-reduce jobs. >>>>>>4. when joining two tables, hive could tell whether the join key is a >>>>>>partitioned column, and make good use of this partition feature. >>>>>>5. no sort-merge join was implemented in hive right now, thus we cannot >>>>>>do the in-equi join. >>> >>>>>>There may be many mistakes in my understanding. Please point it out or >>>>>>give me further information about join in hive. Thanks so much. >>> >>>>>> >>>>>>Luo, Gang >>>--------- >>>>>>Department of Computer Science >>>>>>Duke University >>>>>>(919)316-0993 >>>[email protected] >>> >>>>>> >>>________________________________ 好玩贺卡等你发,邮箱贺卡全新上线! <http://cn.rd.yahoo.com/mail_cn/tagline/card/*http://card.mail.cn.yahoo.com/> >>> >>> >>> ___________________________________________________________ 好玩贺卡等你发,邮箱贺卡全新上线! http://card.mail.cn.yahoo.com/
