[ https://issues.apache.org/jira/browse/HADOOP-3136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12631665#action_12631665 ]
Arun C Murthy commented on HADOOP-3136: --------------------------------------- To clarify: In the 'Pre-Allocation' scheme the per-TaskTracker list has tasks sorted by the priority of their jobs. It is _ok_ to not update all lists simultaneously, we just need to mark the TaskInProgress on allocation and they can subsequently be deleted from other lists after checking to ensure they have already been allocated. Similarly, we need to add the TIP back on all lists on task failure. Essentially it moves the current Job-specific caches to a global per-TaskTracker lists. Clearly we need further thought and discussions along these lines. Should we target anything quick/dirty for 0.19.0? > Assign multiple tasks per TaskTracker heartbeat > ----------------------------------------------- > > Key: HADOOP-3136 > URL: https://issues.apache.org/jira/browse/HADOOP-3136 > Project: Hadoop Core > Issue Type: Improvement > Components: mapred > Reporter: Devaraj Das > Assignee: Arun C Murthy > Fix For: 0.19.0 > > Attachments: HADOOP-3136_0_20080805.patch, > HADOOP-3136_1_20080809.patch, HADOOP-3136_2_20080911.patch > > > In today's logic of finding a new task, we assign only one task per heartbeat. > We probably could give the tasktracker multiple tasks subject to the max > number of free slots it has - for maps we could assign it data local tasks. > We could probably run some logic to decide what to give it if we run out of > data local tasks (e.g., tasks from overloaded racks, tasks that have least > locality, etc.). In addition to maps, if it has reduce slots free, we could > give it reduce task(s) as well. Again for reduces we could probably run some > logic to give more tasks to nodes that are closer to nodes running most maps > (assuming data generated is proportional to the number of maps). For e.g., if > rack1 has 70% of the input splits, and we know that most maps are data/rack > local, we try to schedule ~70% of the reducers there. > Thoughts? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.