[jira] [Updated] (MAPREDUCE-5862) Line records longer than 2x split size aren't handled correctly

bc Wong (JIRA) Sat, 26 Apr 2014 21:48:12 -0700

     [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


bc Wong updated MAPREDUCE-5862:
-------------------------------

    Attachment: recordSpanningMultipleSplits.txt.bz2

The test failure is an NPE failing to find a new .bz2 test fixture. I'm not 
sure whether the patch process applied the binary file correctly. I'm attaching 
the .bz2 file directly to the jira.

The release audit warning is about the lack of ASL header in test fixtures.

All tests are passing locally for me.

> Line records longer than 2x split size aren't handled correctly
> ---------------------------------------------------------------
>
>                 Key: MAPREDUCE-5862
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5862
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>    Affects Versions: 2.3.0
>            Reporter: bc Wong
>            Assignee: bc Wong
>         Attachments: 0001-Handle-records-larger-than-2x-split-size.patch, 
> 0001-Handle-records-larger-than-2x-split-size.patch, 
> recordSpanningMultipleSplits.txt.bz2
>
>
> Suppose this split (100-200) is in the middle of a record (90-240):
> {noformat}
>    0              100            200             300
>    |---- split ----|---- curr ----|---- split ----|
>                  <------- record ------->
>                  90                     240
> {noformat}
>       
> Currently, the first split would read the entire record, up to offset 240, 
> which is good. But the 2nd split has a bug in producing a phantom record of 
> (200, 240).



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (MAPREDUCE-5862) Line records longer than 2x split size aren't handled correctly

Reply via email to