Just in case anyone comes across this again, I figured out that it was a
bug in the local job runner.
https://issues.apache.org/jira/browse/MAPREDUCE-1223
On 11/19/09 7:37 AM, Jason Venner wrote:
Are you certain that your records are being split into key and value the way
you expect. That is the usual reason for odd join behavior.
I haven't used the join code past 19.1, however.
On Wed, Nov 18, 2009 at 12:42 PM, Edmund Kohlwey<[email protected]> wrote:
I'm using Cloudera's distribution for Hadoop 0.20.1 + 133
The javadocs for package org.apache.hadoop.mapred.join state " For a given
key, each operation will consider the cross product of all values for all
sources at that node"
I'm doing an inner join between two tables with a text key. One table has
multiple values for the same key. I would expect, from the documentation, to
see the cross product of the values for a given key represented in the
output. Instead I'm simply getting a single row. Does anyone know if this is
a bug or if its the intended functionality (and the documentation is
flawed)?
table 1
k1 -> a
table 2
k1 ->c
k1 ->d
I should get:
table 1 inner join table 2
k1->ac
k1->ad
Instead I'm getting:
table 1 inner join table 2
k1->ac