Yongqiang mentioned he was going to update the wiki with this information in the thread at http://hadoop.markmail.org/thread/hxd4uwwukuo46lgw.
Yongqiang, have you gotten a chance to complete the sort merge bucket map join and the other skew join you mention in the above thread? Thanks, Jeff On Fri, Aug 6, 2010 at 3:43 AM, bharath vissapragada < [email protected]> wrote: > Roberto .. > > You can find these links useful .. > > > http://www.slideshare.net/ragho/hive-icde-2010?src=related_normal&rel=2374551- > Simple joins and optimizations.. > > http://www.slideshare.net/zshao/hive-user-meeting-march-2010-hive-team - > New kind of joins / features of hive .. > > Thanks > > Bharath.V > 4th year Undergraduate.. > IIIT Hyderabad > > > On Fri, Aug 6, 2010 at 12:16 PM, Cappa Roberto < > [email protected]> wrote: > >> Hi, >> >> I cannot find any documentation about what algorithm performs HIVE to >> translate JOIN clauses to Map-Reduce tasks. >> >> In particular, if I have two tables A and B, each table is written on a >> separate file and each file is splitted on hadoop nodes. When I perform a >> JOIN with A.column = B.column, the framework has to compare full data from >> the first file and full data from the second file. In order to perform a >> full scan of all possibile combinations of values, how can hadoop perform >> it? If each node contains a portion of each file, it seems not possible to >> have a complete comparison. Does one of the two files enterely replicated on >> each node? Or, does HIVE use another kind of strategy/optimization? >> >> Thanks. > > >
