Bryan Jacobs created MAPREDUCE-5936:
---------------------------------------

             Summary: MultipleInputs incorrect output with copied Path
                 Key: MAPREDUCE-5936
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5936
             Project: Hadoop Map/Reduce
          Issue Type: Bug
          Components: mrv1, mrv2
            Reporter: Bryan Jacobs


The MultipleInputs class builds a Map with Path objects as keys and 
mapper-inputformat combinations as values.

This is not correct behavior. If MultipleInputs.addInputPath is called twice 
with the same Path and (for example) two different Mapper classes, the second 
addition will silently overwrite the first.

Expected behavior would be that the input file would be processed one time for 
each call to addInputPath.

This is necessary for applications which are doing join-like operations: 
joining a file with itself is valid, and it should not be incumbent on the 
application developer to recognize when the same Path is included twice to work 
around this bug.

MultipleInputs should be using a multimap or a map with List values.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to