[ https://issues.apache.org/jira/browse/MAPREDUCE-5656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jason Lowe updated MAPREDUCE-5656: ---------------------------------- Attachment: MAPREDUCE-5656-2.patch Slightly updated patch to fix the spacing issue in SplitLineReader. > bzip2 codec can drop records when reading data in splits > -------------------------------------------------------- > > Key: MAPREDUCE-5656 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5656 > Project: Hadoop Map/Reduce > Issue Type: Bug > Affects Versions: 2.0.4-alpha, 0.23.8 > Reporter: Jason Lowe > Assignee: Jason Lowe > Priority: Critical > Attachments: HADOOP-9622-2.patch, HADOOP-9622-testcase.patch, > HADOOP-9622.patch, MAPREDUCE-5656-2.patch, MAPREDUCE-5656.patch, > blockEndingInCR.txt.bz2, blockEndingInCRThenLF.txt.bz2 > > > Bzip2Codec.BZip2CompressionInputStream can cause records to be dropped when > reading them in splits based on where record delimiters occur relative to > compression block boundaries. > Thanks to [~knoguchi] for discovering this problem while working on PIG-3251. -- This message was sent by Atlassian JIRA (v6.1#6144)