[ 
https://issues.apache.org/jira/browse/HADOOP-4683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jothi Padmanabhan updated HADOOP-4683:
--------------------------------------

    Attachment: hadoop-4683.patch

Attaching a patch. 

A 100 node, 100 byte, 100K maps loadgen showed a 3x performance improvement 
(~800 seconds with patch, ~2500 seconds without the patch)
{noformat}
bin/hadoop jar hadoop-$BUILD-test.jar loadgen \
-D test.randomtextwrite.bytes_per_map=$((100)) \
-D test.randomtextwrite.total_bytes=$((100*100000)) \
-D mapred.compress.map.output=false \
-r 1 \
-outKey org.apache.hadoop.io.Text \
-outValue org.apache.hadoop.io.Text \
-outFormat org.apache.hadoop.mapred.lib.NullOutputFormat \
-outdir fakeout
{noformat}

Testpatch results:

     [exec] -1 overall.  
     [exec] 
     [exec]     +1 @author.  The patch does not contain any @author tags.
     [exec] 
     [exec]     -1 tests included.  The patch doesn't appear to include any new 
or modified tests.
     [exec]                         Please justify why no tests are needed for 
this patch.
     [exec] 
     [exec]     +1 javadoc.  The javadoc tool did not generate any warning 
messages.
     [exec] 
     [exec]     +1 javac.  The applied patch does not increase the total number 
of javac compiler warnings.
     [exec] 
     [exec]     +1 findbugs.  The patch does not introduce any new Findbugs 
warnings.
     [exec] 
     [exec]     +1 Eclipse classpath. The patch retains Eclipse classpath 
integrity.
     [exec] 


> Move the call to getMapCompletionEvents in 
> ReduceTask.ReduceCopier.fetchOutputs to a separate thread
> ----------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-4683
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4683
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Devaraj Das
>            Assignee: Jothi Padmanabhan
>             Fix For: 0.20.0
>
>         Attachments: hadoop-4683.patch
>
>
> The method ReduceTask.ReduceCopier.fetchOutputs makes a call to 
> getMapCompletionEvents every iteration of the loop. This should be moved out 
> to a separate thread. This might slow down the shuffle scheduler in some 
> cases since there is a sleep inside the getMapCompletionEvents method.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to