[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12772108#action_12772108
 ] 

Jay Booth commented on MAPREDUCE-1170:
--------------------------------------

Turns out the test only passes because it doesn't try to actually execute the 
job.  It just uses MultipleInputs to add the inputs, then checks that they were 
added to the appropriate structures in memory.

When you run an actual job using TextInputFormat, we get:

java.lang.ClassCastException: 
org.apache.hadoop.mapreduce.lib.input.TaggedInputSplit cannot be cast to 
org.apache.hadoop.mapreduce.lib.input.FileSplit
        at 
org.apache.hadoop.mapreduce.lib.input.LineRecordReader.initialize(LineRecordReader.java:55)
        at 
org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.initialize(MapTask.java:418)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:582)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
        at 
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:176)

This probably affects 0.21 as well, based on my brief reading of the code..  
any suggestions?  Seems kinda hard to work around without changing the 
signature of InputSplit, which would be pretty disruptive.

One (very hacky) method that could be used would be to have LineRecordReader do 
something along the lines of 
if (split instanceof TaggedInputSplit) split = 
((TaggedInputSplit)split).getInnerSplit()

Any other ideas?

> MultipleInputs doesn't work with new API in 0.20 branch
> -------------------------------------------------------
>
>                 Key: MAPREDUCE-1170
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1170
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>    Affects Versions: 0.20.1
>            Reporter: Jay Booth
>             Fix For: 0.20.2
>
>         Attachments: multipleInputs.patch
>
>
> This patch adds support for MultipleInputs (and KeyValueTextInputFormat) in 
> o.a.h.mapreduce.lib.input, working with the new API.  Included passing unit 
> test.  Include for 0.20.2?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to