[jira] Updated: (HADOOP-5368) more user control on customized RecordReader

He Yongqiang (JIRA) Mon, 08 Jun 2009 01:50:33 -0700

     [ 
https://issues.apache.org/jira/browse/HADOOP-5368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


He Yongqiang updated HADOOP-5368:
---------------------------------

    Attachment: hadoop-5368-2009-06-08.patch

Attach a quick fix. 
Users can cast passed RecordReader into FilterRecordReader or 
NewFilterRecordReader. And get raw user customized reader out.

> more user control on customized RecordReader
> --------------------------------------------
>
>                 Key: HADOOP-5368
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5368
>             Project: Hadoop Core
>          Issue Type: Wish
>            Reporter: He Yongqiang
>         Attachments: hadoop-5368-2009-06-08.patch
>
>
> Currently user can define own InputFormat and RecordReader, but the user has 
> little control on them. 
> An example, we input mutiple files into the mapper and want to handle them in 
> different ways depending on which file this mapper is working.
> This can be easily done as follows:
> {code}
>       public class BlockMapRunner implements MapRunnable {
>       private BlockMapper mapper;
>       @Override
>       public void run(RecordReader input, OutputCollector output,
>                       Reporter reporter) throws IOException {
>               if (mapper == null)
>                       return;
>               BlockReader blkReader = (BlockReader) input;
>               this.mapper.initialize(input);
>               ...........
>       }
>       @Override
>       public void configure(JobConf job) {
>               JobConf work = new JobConf(job);
>               Class<? extends BlockMapper> mapCls = 
> work.getBlockMapperClass();
>               if (mapCls != null) {
>                       this.mapper = (BlockMapper) ReflectionUtils
>                                       .newInstance(mapCls, job);
>               }
>       }
> }
> /*
> BlockMapper implements the Mapper and is initialized from RecordReader, from 
> which we get which file this mapper is working on and find the right strategy 
> for it.
> */
> public class ExtendedMapper extends BlockMapper {
>       private Strategy strategy;
>       private Configuration work;
>       @Override
>       public void configure(Configuration job) {
>               this.work = job;
>       }
>       @Override
>       public void initialize(RecordReader reader) throws IOException {
>               String path = ((UserDefinedRecordReader) 
> reader).which_File_We_Are_Working_On();   //((UserDefinedRecordReader) 
> reader) is wrong!
>               this.strategy = this.work.getStrategy(path);
>       }
>       @Override
>       public void map(Key k, V value, OutputCollector output, Reporter 
> reporter)
>                       throws IOException {
>               strategy.handle(k,v);
>       }
> }
> {code}
> {color:red}
> However, the above code does not work. The reader passed into mapper is 
> wrapped by MapTask, and is either SkippingRecordReader or 
> TrackedRecordReader. We can not cast it back and we can not pass any 
> information through the user defined RecordReader. If the 
> SkippingRecordReader and TrackedRecordReader have a method for getting the 
> raw reader, it will not have this problem.
> {color:}
> This problem could be resolved by initiating many map-reduce jobs,one job for 
> each file. But this apparently is what we want.
> Or there exist other solutions? 
> Appreciated for any comments.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-5368) more user control on customized RecordReader

Reply via email to