Re: MapSide Join and left outer or right outer joins?

Jason Venner Wed, 02 Jul 2008 21:56:15 -0700

For the data joins, I let the framework do it - which means onepartition per split - so I have to chose my partition count carefully tofill the machines.

I had an error in my initial outer join mapper, the join map code nowruns about 40x faster than the old brute force read it all shuffle & sort.


Chris Douglas wrote:

Hi Jason-
It only seems like full outer or full inner joins are supported. Iwas hoping to just do a left outer join.
Is this supported or planned?
The full inner/outer joins are examples, really. You can define yourown operations by extending o.a.h.mapred.join.JoinRecordReader oro.a.h.mapred.join.MultiFilterRecordReader and registering your newidentifier with the parser by defining a property"mapred.join.define.<ident>" as your class.
For a left outer join, JoinRecordReader is the correct base.InnerJoinRecordReader and OuterJoinRecordReader should make its useclear.
On the flip side doing the Outer Join is about 8x faster than doing amap/reduce over our dataset.
Cool! Out of curiosity, how are you managing your splits? -C

Re: MapSide Join and left outer or right outer joins?

Reply via email to