Re: MapSide join in Hive

Amr Awadallah Sat, 26 Jun 2010 00:58:53 -0700

Viraj,

1. No

2. Yes, smaller table needs to fit in jvm memory (typically more than1GB for small table is too large).

See slide 7 and after in this preso for different join strategies thatcan help in case the tables are bucketed and sorted.


http://www.slideshare.net/zshao/hive-user-meeting-march-2010-hive-team

There is also the /*+STREAMTABLE(tablealias)*/ hint, which you shoulduse for very large tables (or make sure it is the rightmost table in thejoin clause).


-- amr

On 6/24/2010 10:43 AM, Viraj Bhat wrote:

Hi all,
I am joining 2 datasets, one is around 1.5TB in size and the other isaround 350MB in size.
I wanted to do a Map Side join using "id" as the join column betweenthe two tables. I read about the Mapside join in Hive.
http://wiki.apache.org/hadoop/Hive/LanguageManual/Joins. Are theresome technical specs on Mapside join on a wiki/jira?
Here are some questions:

1) Do the tables need to be sorted on "id"?

2) Is there a restriction on the smaller table size?
Are there other join optimizations that Hive provides which I canapply here?
Viraj

Re: MapSide join in Hive

Reply via email to