[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13606996#comment-13606996
 ] 

Sangjin Lee commented on MAPREDUCE-5069:
----------------------------------------

Those are great questions. There is some method to the madness. :) They have to 
do with the way CombineFileRecordReader works. First of all, it requires a 
specific constructor from the record reader:

* old: (CombineFileSplit, Configuration, Reporter, Integer)
* new: (CombineFileSplit, TaskAttemptContext, Integer)

So the record reader that gets passed into the CombineFileRecordReader needs to 
deal with the CombineFileSplit. That's why the relevant record readers cannot 
be used directly.

The second question also has something to do with the constructor requirement. 
I suppose they could be made private *static* classes (but not a non-static 
inner class, due to the constructor requirement). Initially I created them as 
public in case someone may want to subclass them. But I acknowledge that the 
probability is pretty remote, considering these are real thin.

Let me tinker with it a little bit and fold them into the input format classes.
                
> add concrete common implementations of CombineFileInputFormat
> -------------------------------------------------------------
>
>                 Key: MAPREDUCE-5069
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5069
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: mrv1, mrv2
>    Affects Versions: 2.0.3-alpha
>            Reporter: Sangjin Lee
>            Priority: Minor
>         Attachments: MAPREDUCE-5069-1.patch, MAPREDUCE-5069-2.patch, 
> MAPREDUCE-5069.patch
>
>
> CombineFileInputFormat is abstract, and its specific equivalents to 
> TextInputFormat, SequenceFileInputFormat, etc. are currently not in the 
> hadoop code base.
> These sound like very common need wherever CombineFileInputFormat is used, 
> and different folks would write the same code over and over to achieve the 
> same goal. It sounds very natural for hadoop to provide at least the text and 
> sequence file implementations of the CombineFileInputFormat class.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to