[jira] [Resolved] (MAPREDUCE-286) Optimize the last merge of the map output files

Allen Wittenauer (JIRA) Thu, 17 Jul 2014 14:22:33 -0700

     [ 
https://issues.apache.org/jira/browse/MAPREDUCE-286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Allen Wittenauer resolved MAPREDUCE-286.
----------------------------------------

    Resolution: Fixed

Stale.

> Optimize the last merge of the map output files
> -----------------------------------------------
>
>                 Key: MAPREDUCE-286
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-286
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>            Reporter: Devaraj Das
>
> In ReduceTask, today we do merges of io.sort.factor number of files everytime 
> we merge and write the result back to disk. The last merge can probably be 
> better. For example, if there are io.sort.factor + 10 files at the end, today 
> we will merge 100 files into one and then return an iterator over the 
> remaining 11 files. This can be improved (in terms of disk I/O) to merge the 
> smallest 11 files and then return an iterator over the 100 remaining files. 
> Other option is to not do any single level merge when we have io.sort.factor 
> + n files remaining (where n << io.sort.factor) but just return the iterator 
> directly. Thoughts?



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Resolved] (MAPREDUCE-286) Optimize the last merge of the map output files

Reply via email to