Yeah. The sort merge bucket mapjoin has been finished for sometime, and seems stable now. I did one skew join but haven't get a chance to look at another skew join Namit mentioned to me. But definitely should update the wiki earlier. My bad.
On Fri, Aug 6, 2010 at 8:32 PM, Jeff Hammerbacher <[email protected]> wrote: > Yongqiang mentioned he was going to update the wiki with this information in > the thread at http://hadoop.markmail.org/thread/hxd4uwwukuo46lgw. > > Yongqiang, have you gotten a chance to complete the sort merge bucket map > join and the other skew join you mention in the above thread? > > Thanks, > Jeff > > On Fri, Aug 6, 2010 at 3:43 AM, bharath vissapragada > <[email protected]> wrote: >> >> Roberto .. >> >> You can find these links useful .. >> >> >> http://www.slideshare.net/ragho/hive-icde-2010?src=related_normal&rel=2374551 >> - Simple joins and optimizations.. >> >> http://www.slideshare.net/zshao/hive-user-meeting-march-2010-hive-teamĀ - >> New kind of joins / features of hive .. >> >> Thanks >> >> Bharath.V >> 4th year Undergraduate.. >> IIIT Hyderabad >> >> On Fri, Aug 6, 2010 at 12:16 PM, Cappa Roberto >> <[email protected]> wrote: >>> >>> Hi, >>> >>> I cannot find any documentation about what algorithm performs HIVE to >>> translate JOIN clauses to Map-Reduce tasks. >>> >>> In particular, if I have two tables A and B, each table is written on a >>> separate file and each file is splitted on hadoop nodes. When I perform a >>> JOIN with A.column = B.column, the framework has to compare full data from >>> the first file and full data from the second file. In order to perform a >>> full scan of all possibile combinations of values, how can hadoop perform >>> it? If each node contains a portion of each file, it seems not possible to >>> have a complete comparison. Does one of the two files enterely replicated on >>> each node? Or, does HIVE use another kind of strategy/optimization? >>> >>> Thanks. > >
