The large table is only a problem if the number of values for a given key are 
very large - they are stored in memory.

If you dimension tables are small, you can use map-join. That way, no reduction 
is needed. You need to specify the hint in the select clause
and the list of tables which are small.


For eg:


select /*+ MAPJOIN(smalll_table1, small_table2) */  ... from small_table1 join 
big_table on ... join small_table2 on ...


Thanks,
-namit


On 7/13/09 9:44 PM, "Jason Michael" <[email protected]> wrote:

In our hive instance, we have one large fact-type table that joins to several 
dimension tables on integer keys.  I know from reading the Language Manual that 
in ordering joins it is best to join the largest table last in the sequence in 
order to minimize memory usage.  This won't work in the situation where you 
want to join the large fact table to more than one dimension.  Something like:

select ... from small_table1 join big_table on ... join small_table2 on ...

I have to imagine this is a pretty common pattern, is there any guidance for 
doing this sort of star schema join?


Reply via email to