Roberto .. You can find these links useful ..
http://www.slideshare.net/ragho/hive-icde-2010?src=related_normal&rel=2374551- Simple joins and optimizations.. http://www.slideshare.net/zshao/hive-user-meeting-march-2010-hive-team - New kind of joins / features of hive .. Thanks Bharath.V 4th year Undergraduate.. IIIT Hyderabad On Fri, Aug 6, 2010 at 12:16 PM, Cappa Roberto < [email protected]> wrote: > Hi, > > I cannot find any documentation about what algorithm performs HIVE to > translate JOIN clauses to Map-Reduce tasks. > > In particular, if I have two tables A and B, each table is written on a > separate file and each file is splitted on hadoop nodes. When I perform a > JOIN with A.column = B.column, the framework has to compare full data from > the first file and full data from the second file. In order to perform a > full scan of all possibile combinations of values, how can hadoop perform > it? If each node contains a portion of each file, it seems not possible to > have a complete comparison. Does one of the two files enterely replicated on > each node? Or, does HIVE use another kind of strategy/optimization? > > Thanks.
