[ 
http://issues.apache.org/jira/browse/HADOOP-433?page=comments#action_12427610 ] 
            
Benjamin Reed commented on HADOOP-433:
--------------------------------------

getSplit() would address my need. However, I can imagine in the future that we 
would like access to the RecordReader(), especially if we wanted to do 
sophisticated things like record skipping using indexed files or random 
sampling. The MapRunner interface makes these kind of interesting accesses easy 
to do if you had access to a RecordReader that happens to implement a more full 
featured interface.

A more generic Split interface would be nice as well :)

> Better access to the RecordReader
> ---------------------------------
>
>                 Key: HADOOP-433
>                 URL: http://issues.apache.org/jira/browse/HADOOP-433
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.5.0
>            Reporter: Benjamin Reed
>            Priority: Minor
>
> The record reader has access to the FileSplit which can in turn have 
> information that is useful to the Mapper. For example, Map processing may 
> vary according to file name or attributes associated with a file. 
> Unfortunately, even using a MapRunner you only have access to the progress 
> wrapper of the RecordReader. To get access to the real record reader I had to 
> use a thread local variable which I set in RecordReader.getNext(). It would 
> be much nicer if you could get a reference to the real RecordReader from the 
> RecordReader passed to MapRunner.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to