[jira] [Commented] (MAPREDUCE-5862) Line records longer than 2x split size aren't handled correctly

bc Wong (JIRA) Tue, 29 Apr 2014 09:58:41 -0700

    [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13984495#comment-13984495
 ]


bc Wong commented on MAPREDUCE-5862:
------------------------------------

Does that mean I should add the ASL header to the text fixture? Or is there 
some other way to address the release audit problem? For this particular case, 
the test doesn't care whether the plaintext input has ASL header.

> Line records longer than 2x split size aren't handled correctly
> ---------------------------------------------------------------
>
>                 Key: MAPREDUCE-5862
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5862
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>    Affects Versions: 2.3.0
>            Reporter: bc Wong
>            Assignee: bc Wong
>            Priority: Critical
>         Attachments: 0001-Handle-records-larger-than-2x-split-size.1.patch, 
> 0001-Handle-records-larger-than-2x-split-size.patch, 
> 0001-Handle-records-larger-than-2x-split-size.patch, 
> recordSpanningMultipleSplits.txt.bz2
>
>
> Suppose this split (100-200) is in the middle of a record (90-240):
> {noformat}
>    0              100            200             300
>    |---- split ----|---- curr ----|---- split ----|
>                  <------- record ------->
>                  90                     240
> {noformat}
>       
> Currently, the first split would read the entire record, up to offset 240, 
> which is good. But the 2nd split has a bug in producing a phantom record of 
> (200, 240).



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (MAPREDUCE-5862) Line records longer than 2x split size aren't handled correctly

Reply via email to