You can specify it as a hint in the select list:
select /*+ MAPJOIN(b) */ ... from T a JOIN T2 b on ... In the example above, T2 is the small table which can be cached in memory From: [email protected] [mailto:[email protected]] On Behalf Of Sudipto Das Sent: Wednesday, September 09, 2009 2:01 PM To: [email protected] Subject: Directing Hive to perform Hash Join for small inner tables Hi, I am new to hive so pardon me if this is something very obvious which I might have missed in the documentation. I have an application where I am joining a small inner table with a really large outer table. The inner table is small enough to fit into memory at each mapper. In such a case, putting the inner table into an in-memory hash table and performing a hash based join is much more efficient than performing the sort-merge join which the JOIN operator selects. Is there a way in Hive where I can instruct it perform the hash based join? Thanks Sudipto PhD Candidate CS @ UCSB Santa Barbara, CA 93106, USA http://www.cs.ucsb.edu/~sudipto
