Sandy Ryza created MAPREDUCE-5076:
-------------------------------------
Summary: CombineFileInputFormat with maxSplitSize can omit data
Key: MAPREDUCE-5076
URL: https://issues.apache.org/jira/browse/MAPREDUCE-5076
Project: Hadoop Map/Reduce
Issue Type: Bug
Reporter: Sandy Ryza
Assignee: Sandy Ryza
I ran a local job with CombineFileInputFormat using an 80 MB file and a max
split size of 32 MB (the default local FS block size). The job ran with two
splits of 32 MB, and the last 16 MB were just omitted.
This appears to be caused by a subtle bug in getMoreSplits, in which the code
that generates the splits from the blocks expects the 16 MB block to be at the
end of the block list. But the code that generates the blocks does not respect
this.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira