[
https://issues.apache.org/jira/browse/HADOOP-910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12466060
]
Runping Qi commented on HADOOP-910:
-----------------------------------
This proposal will definitely improve the sortin phase on the reducer side.
I have one suggestion. Instead of starting merge greadily as soon as a Reduce
has
collected io.sort.factor number of segments on disk, you should wait
until the Reducer has collected close to 2 * io.sort.factor - 1 number of
segments.
This way, you can always choose the io.sort.factor number of smallest segments
to merge and aviod unnecessary merging large segments.
This will also result in a more balanced final segments.
I suspect multiple merging threads may not buy you too much since disk i/o will
be the bottleneck of a merging thread, unless a single merging cannot fully
utilize all the available disk resources.
> Reduces can do merges for the on-disk map output files in parallel with their
> copying
> -------------------------------------------------------------------------------------
>
> Key: HADOOP-910
> URL: https://issues.apache.org/jira/browse/HADOOP-910
> Project: Hadoop
> Issue Type: Improvement
> Components: mapred
> Reporter: Devaraj Das
>
> Proposal to extend the parallel in-memory-merge/copying, that is being done
> as part of HADOOP-830, to the on-disk files.
> Today, the Reduces dump the map output files to disk and the final merge
> happens only after all the map outputs have been collected. It might make
> sense to parallelize this part. That is, whenever a Reduce has collected
> io.sort.factor number of segments on disk, it initiates a merge of those and
> creates one big segment. If the rate of copying is faster than the merge, we
> can probably have multiple threads doing parallel merges of independent sets
> of io.sort.factor number of segments. If the rate of copying is not as fast
> as merge, we stand to gain a lot - at the end of copying of all the map
> outputs, we will be left with a small number of segments for the final merge
> (which hopefully will feed the reduce directly (via the RawKeyValueIterator)
> without having to hit the disk for writing additional output segments).
> If the disk bandwidth is higher than the network bandwidth, we have a good
> story, I guess, to do such a thing.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira