[jira] Commented: (HADOOP-2568) Pin reduces with consecutive IDs to nodes and have a single shuffle task per job per node

Arun C Murthy (JIRA) Fri, 01 Feb 2008 15:15:31 -0800

    [ 
https://issues.apache.org/jira/browse/HADOOP-2568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12564972#action_12564972
 ]


Arun C Murthy commented on HADOOP-2568:
---------------------------------------

bq.  Pin reduces with consecutive IDs to nodes and have a single shuffle task 
per job per node

We'll achieve similar characteristics by allocating more than one task per 
heartbeat... so on a heartbeat if we assign maps {0-1} to TT1 and maps {2-3} to 
TT2 ... and similarly we allocate reduces {0-1} to TT3 and reduces {2-3} to TT4 
we can group map-outputs 0-3 and do a single shuffle...

This would also help cluster utilization since now we only do one task at a 
time... allocating more tasks up-front seems attractive from more than one 
angle.

Thoughts?

> Pin reduces with consecutive IDs to nodes and have a single shuffle task per 
> job per node
> -----------------------------------------------------------------------------------------
>
>                 Key: HADOOP-2568
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2568
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>            Reporter: Devaraj Das
>            Assignee: Devaraj Das
>             Fix For: 0.17.0
>
>
> The idea is to reduce disk seeks while fetching the map outputs. If we 
> opportunistically pin reduces with consecutive IDs (like 5, 6, 7 .. 
> max-reduce-tasks on that node) on a node, and have a single shuffle task, we 
> should benefit, if for every fetch, that shuffle task fetches all the outputs 
> for the reduces it is shuffling for. In the case where we have 2 reduces per 
> node, we will decrease the #seeks in the map output files on the map nodes by 
> 50%. Memory usage by that shuffle task would be proportional to the number of 
> reduces it is shuffling for (to account for the number of ramfs instances, 
> one per reduce). But overall it should help. 
> Thoughts?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-2568) Pin reduces with consecutive IDs to nodes and have a single shuffle task per job per node

Reply via email to