[
https://issues.apache.org/jira/browse/MAPREDUCE-5352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Siddharth Seth updated MAPREDUCE-5352:
--------------------------------------
Attachment: MAPREDUCE-5352.4.txt
Updated the patch to make the test more consistent. Currently the ordering
would've been determined by the order in which the hashmap is walked.
Ready for review and commit now.
> Optimize node local splits generated by CombineFileInputFormat
> ---------------------------------------------------------------
>
> Key: MAPREDUCE-5352
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5352
> Project: Hadoop Map/Reduce
> Issue Type: Improvement
> Affects Versions: 2.0.5-alpha
> Reporter: Siddharth Seth
> Assignee: Siddharth Seth
> Attachments: MAPREDUCE-5352.1.txt, MAPREDUCE-5352.2.txt,
> MAPREDUCE-5352.3.txt, MAPREDUCE-5352.4.txt
>
>
> CombineFileInputFormat currently walks through all available nodes and
> generates multiple (maxSplitsPerNode) splits on a single node before
> attempting to generate splits on subsequent nodes. This ends up reducing the
> possibility of generating splits for subsequent nodes - since these blocks
> will no longer be available for subsequent nodes. Allowing splits to go 1
> block above the max-split-size makes this worse.
> Allocating a single split per node in one iteration, should help increase the
> distribution of splits across nodes - so the subsequent nodes will have more
> blocks to choose from.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira