[ https://issues.apache.org/jira/browse/HADOOP-5368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
He Yongqiang updated HADOOP-5368: --------------------------------- Attachment: hadoop-5368-2009-06-08.patch Attach a quick fix. Users can cast passed RecordReader into FilterRecordReader or NewFilterRecordReader. And get raw user customized reader out. > more user control on customized RecordReader > -------------------------------------------- > > Key: HADOOP-5368 > URL: https://issues.apache.org/jira/browse/HADOOP-5368 > Project: Hadoop Core > Issue Type: Wish > Reporter: He Yongqiang > Attachments: hadoop-5368-2009-06-08.patch > > > Currently user can define own InputFormat and RecordReader, but the user has > little control on them. > An example, we input mutiple files into the mapper and want to handle them in > different ways depending on which file this mapper is working. > This can be easily done as follows: > {code} > public class BlockMapRunner implements MapRunnable { > private BlockMapper mapper; > @Override > public void run(RecordReader input, OutputCollector output, > Reporter reporter) throws IOException { > if (mapper == null) > return; > BlockReader blkReader = (BlockReader) input; > this.mapper.initialize(input); > ........... > } > @Override > public void configure(JobConf job) { > JobConf work = new JobConf(job); > Class<? extends BlockMapper> mapCls = > work.getBlockMapperClass(); > if (mapCls != null) { > this.mapper = (BlockMapper) ReflectionUtils > .newInstance(mapCls, job); > } > } > } > /* > BlockMapper implements the Mapper and is initialized from RecordReader, from > which we get which file this mapper is working on and find the right strategy > for it. > */ > public class ExtendedMapper extends BlockMapper { > private Strategy strategy; > private Configuration work; > @Override > public void configure(Configuration job) { > this.work = job; > } > @Override > public void initialize(RecordReader reader) throws IOException { > String path = ((UserDefinedRecordReader) > reader).which_File_We_Are_Working_On(); //((UserDefinedRecordReader) > reader) is wrong! > this.strategy = this.work.getStrategy(path); > } > @Override > public void map(Key k, V value, OutputCollector output, Reporter > reporter) > throws IOException { > strategy.handle(k,v); > } > } > {code} > {color:red} > However, the above code does not work. The reader passed into mapper is > wrapped by MapTask, and is either SkippingRecordReader or > TrackedRecordReader. We can not cast it back and we can not pass any > information through the user defined RecordReader. If the > SkippingRecordReader and TrackedRecordReader have a method for getting the > raw reader, it will not have this problem. > {color:} > This problem could be resolved by initiating many map-reduce jobs,one job for > each file. But this apparently is what we want. > Or there exist other solutions? > Appreciated for any comments. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.