[
https://issues.apache.org/jira/browse/HADOOP-910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12572762#action_12572762
]
Hadoop QA commented on HADOOP-910:
----------------------------------
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12376230/HADOOP-910.patch
against trunk revision 619744.
@author +1. The patch does not contain any @author tags.
tests included -1. The patch doesn't appear to include any new or modified
tests.
Please justify why no tests are needed for this patch.
javadoc +1. The javadoc tool did not generate any warning messages.
javac +1. The applied patch does not generate any new javac compiler
warnings.
release audit +1. The applied patch does not generate any new release
audit warnings.
findbugs -1. The patch appears to introduce 1 new Findbugs warnings.
core tests +1. The patch passed core unit tests.
contrib tests +1. The patch passed contrib unit tests.
Test results:
http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1835/testReport/
Findbugs warnings:
http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1835/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results:
http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1835/artifact/trunk/build/test/checkstyle-errors.html
Console output:
http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1835/console
This message is automatically generated.
> Reduces can do merges for the on-disk map output files in parallel with their
> copying
> -------------------------------------------------------------------------------------
>
> Key: HADOOP-910
> URL: https://issues.apache.org/jira/browse/HADOOP-910
> Project: Hadoop Core
> Issue Type: Improvement
> Components: mapred
> Reporter: Devaraj Das
> Assignee: Amar Kamat
> Attachments: HADOOP-910-review.patch, HADOOP-910.patch,
> HADOOP-910.patch
>
>
> Proposal to extend the parallel in-memory-merge/copying, that is being done
> as part of HADOOP-830, to the on-disk files.
> Today, the Reduces dump the map output files to disk and the final merge
> happens only after all the map outputs have been collected. It might make
> sense to parallelize this part. That is, whenever a Reduce has collected
> io.sort.factor number of segments on disk, it initiates a merge of those and
> creates one big segment. If the rate of copying is faster than the merge, we
> can probably have multiple threads doing parallel merges of independent sets
> of io.sort.factor number of segments. If the rate of copying is not as fast
> as merge, we stand to gain a lot - at the end of copying of all the map
> outputs, we will be left with a small number of segments for the final merge
> (which hopefully will feed the reduce directly (via the RawKeyValueIterator)
> without having to hit the disk for writing additional output segments).
> If the disk bandwidth is higher than the network bandwidth, we have a good
> story, I guess, to do such a thing.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.