[
https://issues.apache.org/jira/browse/HADOOP-9622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13828993#comment-13828993
]
Jason Lowe commented on HADOOP-9622:
------------------------------------
bq. There are already two Test classes TestLineRecordReader in mapred and
mapreduce.lib.input packages in hadoop-mapreduce-client-jobclient project. It
will be better to move included tests to these classes instead of creating
multiple classes.
I'd much rather keep the unit tests for LineRecordReader in the same package as
the code, that way when the code is updated Jenkins will run the tests to catch
errors. If we move these unit tests to the jobclient module then if a patch
touches only LineRecordReader in the core module we won't run the unit tests
since they're in a different module.
Instead I'd rather rename the TestLineRecordReader tests in the jobclient
module to something like TestLineRecordReaderJobs. Those tests are really
integration tests rather than unit tests, since they're running a job for each
test rather than just the LineRecordReader in isolation.
> bzip2 codec can drop records when reading data in splits
> --------------------------------------------------------
>
> Key: HADOOP-9622
> URL: https://issues.apache.org/jira/browse/HADOOP-9622
> Project: Hadoop Common
> Issue Type: Bug
> Components: io
> Affects Versions: 2.0.4-alpha, 0.23.8
> Reporter: Jason Lowe
> Assignee: Jason Lowe
> Priority: Critical
> Attachments: HADOOP-9622-2.patch, HADOOP-9622-testcase.patch,
> HADOOP-9622.patch, blockEndingInCR.txt.bz2, blockEndingInCRThenLF.txt.bz2
>
>
> Bzip2Codec.BZip2CompressionInputStream can cause records to be dropped when
> reading them in splits based on where record delimiters occur relative to
> compression block boundaries.
> Thanks to [~knoguchi] for discovering this problem while working on PIG-3251.
--
This message was sent by Atlassian JIRA
(v6.1#6144)