[ https://issues.apache.org/jira/browse/MAPREDUCE-6549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15033565#comment-15033565 ]
Junping Du commented on MAPREDUCE-6549: --------------------------------------- Hi [~wilfreds], [~rkanter] and [~jlowe], I noticed that MAPREDUCE-6558 (compressed version) is still open. Do we think this patch could go independently to 2.6.3 or better to be combined with MAPREDUCE-6558 to 2.6.4? > multibyte delimiters with LineRecordReader cause duplicate records > ------------------------------------------------------------------ > > Key: MAPREDUCE-6549 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6549 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mrv1, mrv2 > Affects Versions: 2.7.2 > Reporter: Dustin Cote > Assignee: Wilfred Spiegelenburg > Fix For: 2.8.0, 2.6.3, 2.7.3 > > Attachments: MAPREDUCE-6549-1.patch, MAPREDUCE-6549-2.patch, > MAPREDUCE-6549.3.patch > > > LineRecorderReader currently produces duplicate records under certain > scenarios such as: > 1) input string: "abc+++def++ghi++" > delimiter string: "+++" > test passes with all sizes of the split > 2) input string: "abc++def+++ghi++" > delimiter string: "+++" > test fails with a split size of 4 > 2) input string: "abc+++def++ghi++" > delimiter string: "++" > test fails with a split size of 5 > 3) input string "abc+++defg++hij++" > delimiter string: "++" > test fails with a split size of 4 > 4) input string "abc++def+++ghi++" > delimiter string: "++" > test fails with a split size of 9 -- This message was sent by Atlassian JIRA (v6.3.4#6332)