Map-side joins always empty when an input has NullWritable value
----------------------------------------------------------------

                 Key: MAPREDUCE-2597
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2597
             Project: Hadoop Map/Reduce
          Issue Type: Bug
    Affects Versions: 0.20.2
         Environment: Linux cluster
            Reporter: Michael White


It is not uncommon to have a sorted list of data that has no specific value 
associated with it as input to a map-side join, e.g. as an exact-match filter.  
In these cases, you would typically have a value class of NullWritable.  
However, when performing a map-side join in Hadoop 0.20.2, we have found that 
any input that has value class of NullWritable results in the Mapper never 
getting called.  I found this with a 3-way map-side join, and my colleague 
tells me he ran into the same issue.  I have not specifically tested a 2-way 
join to see if the problem occurs, so it may be that the bug is specific to 
n-way joins for n>2 (though I suspect not).

The current workaround is to use some other value type (e.g. IntWritable) and 
stuff an arbitrary value into it.

For a join, the value class should have no bearing on the set of keys that are 
considered matching.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to