[
https://issues.apache.org/jira/browse/MAPREDUCE-5862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13984274#comment-13984274
]
Jason Lowe commented on MAPREDUCE-5862:
---------------------------------------
The release audit warning needs to be addressed, and I'm assuming the test
failures are because the patch isn't applying the binary file correctly. Looks
good to me once the release audit problem is addressed and the two tests pass
after the patch is properly applied.
> Line records longer than 2x split size aren't handled correctly
> ---------------------------------------------------------------
>
> Key: MAPREDUCE-5862
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5862
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Affects Versions: 2.3.0
> Reporter: bc Wong
> Assignee: bc Wong
> Priority: Critical
> Attachments: 0001-Handle-records-larger-than-2x-split-size.1.patch,
> 0001-Handle-records-larger-than-2x-split-size.patch,
> 0001-Handle-records-larger-than-2x-split-size.patch,
> recordSpanningMultipleSplits.txt.bz2
>
>
> Suppose this split (100-200) is in the middle of a record (90-240):
> {noformat}
> 0 100 200 300
> |---- split ----|---- curr ----|---- split ----|
> <------- record ------->
> 90 240
> {noformat}
>
> Currently, the first split would read the entire record, up to offset 240,
> which is good. But the 2nd split has a bug in producing a phantom record of
> (200, 240).
--
This message was sent by Atlassian JIRA
(v6.2#6252)