[jira] [Commented] (MAPREDUCE-5076) CombineFileInputFormat can create splits that exceed maxSplitSize

Sandy Ryza (JIRA) Mon, 18 Mar 2013 14:23:17 -0700

    [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13605659#comment-13605659
 ]


Sandy Ryza commented on MAPREDUCE-5076:
---------------------------------------

It looks like this is not nearly as bad as I first thought it was.  The last 16 
MB were being added to the second split, meaning that the max split size was 
being exceeded, but that no data was lost.
                
> CombineFileInputFormat can create splits that exceed maxSplitSize
> -----------------------------------------------------------------
>
>                 Key: MAPREDUCE-5076
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5076
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>            Reporter: Sandy Ryza
>            Assignee: Sandy Ryza
>
> I ran a local job with CombineFileInputFormat using an 80 MB file and a max 
> split size of 32 MB (the default local FS block size).  The job ran with two 
> splits of 32 MB, and the last 16 MB were just omitted.
> This appears to be caused by a subtle bug in getMoreSplits, in which the code 
> that generates the splits from the blocks expects the 16 MB block to be at 
> the end of the block list. But the code that generates the blocks does not 
> respect this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-5076) CombineFileInputFormat can create splits that exceed maxSplitSize

Reply via email to