[ https://issues.apache.org/jira/browse/MAPREDUCE-5862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
bc Wong updated MAPREDUCE-5862: ------------------------------- Attachment: recordSpanningMultipleSplits.txt.bz2 The test failure is an NPE failing to find a new .bz2 test fixture. I'm not sure whether the patch process applied the binary file correctly. I'm attaching the .bz2 file directly to the jira. The release audit warning is about the lack of ASL header in test fixtures. All tests are passing locally for me. > Line records longer than 2x split size aren't handled correctly > --------------------------------------------------------------- > > Key: MAPREDUCE-5862 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5862 > Project: Hadoop Map/Reduce > Issue Type: Bug > Affects Versions: 2.3.0 > Reporter: bc Wong > Assignee: bc Wong > Attachments: 0001-Handle-records-larger-than-2x-split-size.patch, > 0001-Handle-records-larger-than-2x-split-size.patch, > recordSpanningMultipleSplits.txt.bz2 > > > Suppose this split (100-200) is in the middle of a record (90-240): > {noformat} > 0 100 200 300 > |---- split ----|---- curr ----|---- split ----| > <------- record -------> > 90 240 > {noformat} > > Currently, the first split would read the entire record, up to offset 240, > which is good. But the 2nd split has a bug in producing a phantom record of > (200, 240). -- This message was sent by Atlassian JIRA (v6.2#6252)