[ 
https://issues.apache.org/jira/browse/HADOOP-910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12482261
 ] 

Runping Qi commented on HADOOP-910:
-----------------------------------


The observation I made was this: The map phase took 6+hours (I cannot recall 
the exact number). 
During the map phase, each reducer copied the 8400+ mapoutput files to its 
local directory. 
This overlaped well with the map phase. However, the merging these 8400+ files 
into 100 runs 
that can feed to the reduc e phase did not start until all the files were 
copied 
(which was after all the mapper were done).  That merge phase could have 
started much earlier, working 
comcurrently with the copying phase, thus could be completed much earlier (in 
the specific case, the merge phase took about 8 hours!).



> Reduces can do merges for the on-disk map output files in parallel with their 
> copying
> -------------------------------------------------------------------------------------
>
>                 Key: HADOOP-910
>                 URL: https://issues.apache.org/jira/browse/HADOOP-910
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>            Reporter: Devaraj Das
>         Assigned To: Gautam Kowshik
>
> Proposal to extend the parallel in-memory-merge/copying, that is being done 
> as part of HADOOP-830, to the on-disk files.
> Today, the Reduces dump the map output files to disk and the final merge 
> happens only after all the map outputs have been collected. It might make 
> sense to parallelize this part. That is, whenever a Reduce has collected 
> io.sort.factor number of segments on disk, it initiates a merge of those and 
> creates one big segment. If the rate of copying is faster than the merge, we 
> can probably have multiple threads doing parallel merges of independent sets 
> of io.sort.factor number of segments. If the rate of copying is not as fast 
> as merge, we stand to gain a lot - at the end of copying of all the map 
> outputs, we will be left with a small number of segments for the final merge 
> (which hopefully will feed the reduce directly (via the RawKeyValueIterator) 
> without having to hit the disk for writing additional output segments).
> If the disk bandwidth is higher than the network bandwidth, we have a good 
> story, I guess, to do such a thing.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to