Hi, I cannot find any documentation about what algorithm performs HIVE to translate JOIN clauses to Map-Reduce tasks.
In particular, if I have two tables A and B, each table is written on a separate file and each file is splitted on hadoop nodes. When I perform a JOIN with A.column = B.column, the framework has to compare full data from the first file and full data from the second file. In order to perform a full scan of all possibile combinations of values, how can hadoop perform it? If each node contains a portion of each file, it seems not possible to have a complete comparison. Does one of the two files enterely replicated on each node? Or, does HIVE use another kind of strategy/optimization? Thanks.
