[
https://issues.apache.org/jira/browse/MAPREDUCE-5076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13605659#comment-13605659
]
Sandy Ryza commented on MAPREDUCE-5076:
---------------------------------------
It looks like this is not nearly as bad as I first thought it was. The last 16
MB were being added to the second split, meaning that the max split size was
being exceeded, but that no data was lost.
> CombineFileInputFormat can create splits that exceed maxSplitSize
> -----------------------------------------------------------------
>
> Key: MAPREDUCE-5076
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5076
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Reporter: Sandy Ryza
> Assignee: Sandy Ryza
>
> I ran a local job with CombineFileInputFormat using an 80 MB file and a max
> split size of 32 MB (the default local FS block size). The job ran with two
> splits of 32 MB, and the last 16 MB were just omitted.
> This appears to be caused by a subtle bug in getMoreSplits, in which the code
> that generates the splits from the blocks expects the 16 MB block to be at
> the end of the block list. But the code that generates the blocks does not
> respect this.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira