Join optimization for star schemas

Jason Michael Mon, 13 Jul 2009 21:45:04 -0700

In our hive instance, we have one large fact-type table that joins to several 
dimension tables on integer keys.  I know from reading the Language Manual that 
in ordering joins it is best to join the largest table last in the sequence in 
order to minimize memory usage.  This won't work in the situation where you 
want to join the large fact table to more than one dimension.  Something like:


select ... from small_table1 join big_table on ... join small_table2 on ...

I have to imagine this is a pretty common pattern, is there any guidance for 
doing this sort of star schema join?

Join optimization for star schemas

Reply via email to