The large table is only a problem if the number of values for a given key are very large - they are stored in memory.
If you dimension tables are small, you can use map-join. That way, no reduction is needed. You need to specify the hint in the select clause and the list of tables which are small. For eg: select /*+ MAPJOIN(smalll_table1, small_table2) */ ... from small_table1 join big_table on ... join small_table2 on ... Thanks, -namit On 7/13/09 9:44 PM, "Jason Michael" <[email protected]> wrote: In our hive instance, we have one large fact-type table that joins to several dimension tables on integer keys. I know from reading the Language Manual that in ordering joins it is best to join the largest table last in the sequence in order to minimize memory usage. This won't work in the situation where you want to join the large fact table to more than one dimension. Something like: select ... from small_table1 join big_table on ... join small_table2 on ... I have to imagine this is a pretty common pattern, is there any guidance for doing this sort of star schema join?
