[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13639107#comment-13639107
 ] 

Robert Joseph Evans commented on MAPREDUCE-5069:
------------------------------------------------

I have a few minor comments.  Nothing major and nothing that I am very tied to 
so if you want to push back on them I would be fine checking the code in as is.
 
I understand why you wrote CombineFileRecordReaderWrapper instead of extending 
CombineFileRecordReader, but I am not really sure the name is that clear.  
CombineFileRecordReaderWrapper is not a wrapper for a CombineFileRecordReader, 
but is a wrapper for a FileRecordReader that can be used by 
CombineFileRecordReader.  I am not really sure of a clearer name for it though 
so I think it is OK.

I think I would prefer to have checkFileSplit be turned into asserts, or 
possibly remove it entirely.  If we have gotten that far we should be 
controlled by the CombineFileRecordReader, and we are just validating that it 
has set the values that it should have set already and does currently set.  I 
just don't see many situations out side of testing where we would want to check 
this.  Which is why I would change them to asserts so that for tests we can 
validate that everything is correct, but there is no overhead when we are not 
testing.  

I am also not too happy with having Random used in tests.  Random makes it very 
hard to reproduce an error, I have had too many of them fail sporadically, and 
have to dig through the logs to find the seed, then hard code the seed... 
Granted that means that Random found a bug that hard codeing it would not, and 
I realize that, so I'll leave it up to you. I would just rather have the test 
with a hard coded seed, or not be random at all.  But that is just me.  


                
> add concrete common implementations of CombineFileInputFormat
> -------------------------------------------------------------
>
>                 Key: MAPREDUCE-5069
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5069
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: mrv1, mrv2
>    Affects Versions: 2.0.3-alpha
>            Reporter: Sangjin Lee
>            Priority: Minor
>         Attachments: MAPREDUCE-5069-1.patch, MAPREDUCE-5069-2.patch, 
> MAPREDUCE-5069-3.patch, MAPREDUCE-5069-4.patch, MAPREDUCE-5069-5.patch, 
> MAPREDUCE-5069.patch
>
>
> CombineFileInputFormat is abstract, and its specific equivalents to 
> TextInputFormat, SequenceFileInputFormat, etc. are currently not in the 
> hadoop code base.
> These sound like very common need wherever CombineFileInputFormat is used, 
> and different folks would write the same code over and over to achieve the 
> same goal. It sounds very natural for hadoop to provide at least the text and 
> sequence file implementations of the CombineFileInputFormat class.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to