You can specify it as a hint in the select list:

select /*+ MAPJOIN(b) */  ...   from T a JOIN T2 b on ...


In the example above, T2 is the small table which can be cached in memory




From: [email protected] [mailto:[email protected]] On Behalf Of Sudipto Das
Sent: Wednesday, September 09, 2009 2:01 PM
To: [email protected]
Subject: Directing Hive to perform Hash Join for small inner tables

Hi,

I am new to hive so pardon me if this is something very obvious which I might 
have missed in the documentation.

I have an application where I am joining a small inner table with a really 
large outer table. The inner table is small enough to fit into memory at each 
mapper. In such a case, putting the inner table into an in-memory hash table 
and performing a hash based join is much more efficient than performing the 
sort-merge join which the JOIN operator selects. Is there a way in Hive where I 
can instruct it perform the hash based join?

Thanks

Sudipto

PhD Candidate
CS @ UCSB
Santa Barbara, CA 93106, USA
http://www.cs.ucsb.edu/~sudipto

Reply via email to