[
https://issues.apache.org/jira/browse/MAPREDUCE-286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Allen Wittenauer resolved MAPREDUCE-286.
----------------------------------------
Resolution: Fixed
Stale.
> Optimize the last merge of the map output files
> -----------------------------------------------
>
> Key: MAPREDUCE-286
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-286
> Project: Hadoop Map/Reduce
> Issue Type: Improvement
> Reporter: Devaraj Das
>
> In ReduceTask, today we do merges of io.sort.factor number of files everytime
> we merge and write the result back to disk. The last merge can probably be
> better. For example, if there are io.sort.factor + 10 files at the end, today
> we will merge 100 files into one and then return an iterator over the
> remaining 11 files. This can be improved (in terms of disk I/O) to merge the
> smallest 11 files and then return an iterator over the 100 remaining files.
> Other option is to not do any single level merge when we have io.sort.factor
> + n files remaining (where n << io.sort.factor) but just return the iterator
> directly. Thoughts?
--
This message was sent by Atlassian JIRA
(v6.2#6252)