[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandy Ryza updated MAPREDUCE-5076:
----------------------------------

    Summary: CombineFileInputFormat with can create splits that exceed 
maxSplitSize  (was: CombineFileInputFormat with maxSplitSize can omit data)
    
> CombineFileInputFormat with can create splits that exceed maxSplitSize
> ----------------------------------------------------------------------
>
>                 Key: MAPREDUCE-5076
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5076
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>            Reporter: Sandy Ryza
>            Assignee: Sandy Ryza
>
> I ran a local job with CombineFileInputFormat using an 80 MB file and a max 
> split size of 32 MB (the default local FS block size).  The job ran with two 
> splits of 32 MB, and the last 16 MB were just omitted.
> This appears to be caused by a subtle bug in getMoreSplits, in which the code 
> that generates the splits from the blocks expects the 16 MB block to be at 
> the end of the block list. But the code that generates the blocks does not 
> respect this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to