[ https://issues.apache.org/jira/browse/MAPREDUCE-6549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15028039#comment-15028039 ]
Hudson commented on MAPREDUCE-6549: ----------------------------------- FAILURE: Integrated in Hadoop-trunk-Commit #8894 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/8894/]) MAPREDUCE-6549. multibyte delimiters with LineRecordReader cause (rkanter: rev 7fd00b3db4b7d73afd41276ba9a06ec06a0e1762) * hadoop-mapreduce-project/CHANGES.txt * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/UncompressedSplitLineReader.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapred/TestLineRecordReader.java * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/LineReader.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapreduce/lib/input/TestLineRecordReader.java > multibyte delimiters with LineRecordReader cause duplicate records > ------------------------------------------------------------------ > > Key: MAPREDUCE-6549 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6549 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mrv1, mrv2 > Affects Versions: 2.7.2 > Reporter: Dustin Cote > Assignee: Wilfred Spiegelenburg > Fix For: 2.8.0 > > Attachments: MAPREDUCE-6549-1.patch, MAPREDUCE-6549-2.patch, > MAPREDUCE-6549.3.patch > > > LineRecorderReader currently produces duplicate records under certain > scenarios such as: > 1) input string: "abc+++def++ghi++" > delimiter string: "+++" > test passes with all sizes of the split > 2) input string: "abc++def+++ghi++" > delimiter string: "+++" > test fails with a split size of 4 > 2) input string: "abc+++def++ghi++" > delimiter string: "++" > test fails with a split size of 5 > 3) input string "abc+++defg++hij++" > delimiter string: "++" > test fails with a split size of 4 > 4) input string "abc++def+++ghi++" > delimiter string: "++" > test fails with a split size of 9 -- This message was sent by Atlassian JIRA (v6.3.4#6332)