[ 
https://issues.apache.org/jira/browse/HADOOP-9622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13828993#comment-13828993
 ] 

Jason Lowe commented on HADOOP-9622:
------------------------------------

bq. There are already two Test classes TestLineRecordReader in mapred and 
mapreduce.lib.input packages in hadoop-mapreduce-client-jobclient project. It 
will be better to move included tests to these classes instead of creating 
multiple classes.

I'd much rather keep the unit tests for LineRecordReader in the same package as 
the code, that way when the code is updated Jenkins will run the tests to catch 
errors.  If we move these unit tests to the jobclient module then if a patch 
touches only LineRecordReader in the core module we won't run the unit tests 
since they're in a different module.

Instead I'd rather rename the TestLineRecordReader tests in the jobclient 
module to something like TestLineRecordReaderJobs.  Those tests are really 
integration tests rather than unit tests, since they're running a job for each 
test rather than just the LineRecordReader in isolation.

> bzip2 codec can drop records when reading data in splits
> --------------------------------------------------------
>
>                 Key: HADOOP-9622
>                 URL: https://issues.apache.org/jira/browse/HADOOP-9622
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: io
>    Affects Versions: 2.0.4-alpha, 0.23.8
>            Reporter: Jason Lowe
>            Assignee: Jason Lowe
>            Priority: Critical
>         Attachments: HADOOP-9622-2.patch, HADOOP-9622-testcase.patch, 
> HADOOP-9622.patch, blockEndingInCR.txt.bz2, blockEndingInCRThenLF.txt.bz2
>
>
> Bzip2Codec.BZip2CompressionInputStream can cause records to be dropped when 
> reading them in splits based on where record delimiters occur relative to 
> compression block boundaries.
> Thanks to [~knoguchi] for discovering this problem while working on PIG-3251.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to