How HIVE manages a join

Cappa Roberto Thu, 05 Aug 2010 23:46:50 -0700

Hi,

I cannot find any documentation about what algorithm performs HIVE to translate 
JOIN clauses to Map-Reduce tasks.


In particular, if I have two tables A and B, each table is written on a 
separate file and each file is splitted on hadoop nodes. When I perform a JOIN 
with A.column = B.column, the framework has to compare full data from the first 
file and full data from the second file. In order to perform a full scan of all 
possibile combinations of values, how can hadoop perform it? If each node 
contains a portion of each file, it seems not possible to have a complete 
comparison. Does one of the two files enterely replicated on each node? Or, 
does HIVE use another kind of strategy/optimization?

Thanks.

How HIVE manages a join

Reply via email to