Just in case anyone comes across this again, I figured out that it was a bug in the local job runner.
https://issues.apache.org/jira/browse/MAPREDUCE-1223

On 11/19/09 7:37 AM, Jason Venner wrote:
Are you certain that your records are being split into key and value the way
you expect. That is the usual reason for odd join behavior.
I haven't used the join code past 19.1, however.

On Wed, Nov 18, 2009 at 12:42 PM, Edmund Kohlwey<[email protected]>  wrote:

I'm using Cloudera's distribution for Hadoop 0.20.1 + 133

The javadocs for package org.apache.hadoop.mapred.join state " For a given
key, each operation will consider the cross product of all values for all
sources at that node"

I'm doing an inner join between two tables with a text key. One table has
multiple values for the same key. I would expect, from the documentation, to
see the cross product of the values for a given key represented in the
output. Instead I'm simply getting a single row. Does anyone know if this is
a bug or if its the intended functionality (and the documentation is
flawed)?

table 1
k1 ->  a

table 2
k1 ->c
k1 ->d

I should get:
table 1 inner join table 2
k1->ac
k1->ad

Instead I'm getting:
table 1 inner join table 2
k1->ac




Reply via email to