[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12758851#action_12758851
 ] 

Devaraj Das commented on MAPREDUCE-1028:
----------------------------------------

Hmm.. It should be okay to have a JVM occupying higher number of slots run a 
task that requires fewer slots. However, we need to fix the 
TaskTracker.TaskInProgress.releaseSlot. I am thinking that it might make sense 
to keep track of the slot count per JVM (long term, we anyway should be 
monitoring the resources being used by the JVM per se). Today, in releaseSlot, 
we release #slots equal to the number of slots that the task took to run. 
Instead it could just decrement the slot count by the number of slots the JVM 
took to run the task. Also, when the task is assigned to the TT, the 
JobInProgress.{obtainTaskCleanupTask,obtainJobCleanupTask,obtainJobSetupTask} 
methods should specifically sets the #slots required to 1 (today that's the 
only way to let the TT know that the task would require 1 slot).

The other option is to have the JobTracker be aware of slot counts for the 
special tasks. Since the special tasks are scheduled directly by the 
JobTracker, that would be required to be done. 

> Cleanup tasks are scheduled using high memory configuration, leaving tasks in 
> unassigned state.
> -----------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1028
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1028
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: jobtracker
>    Affects Versions: 0.21.0
>            Reporter: Hemanth Yamijala
>            Assignee: Ravi Gummadi
>            Priority: Blocker
>             Fix For: 0.21.0
>
>
> A cleanup task is launched for a failed task of a job. This task is created 
> based on the TIP of the failed task, and so is marked as requiring as many 
> slots to run as the original task itself. For instance, if a high RAM job 
> requires 2 slots per task, a cleanup task of the high RAM jobs requires 2 
> slots as well.
> Further, a cleanup task is scheduled to a tasktracker by the jobtracker 
> itself and not the scheduler. While doing so, the JT doesn't check if the TT 
> has enough slots free to run a high RAM cleanup task - always assuming 1 slot 
> is enough. Thus, a task is oversubscribed to the TT.
> However, on the TT, before launch, we check that the task can actually run, 
> and wait for so many slots to become available. If the slots don't get freed 
> quickly, we will have tasks stuck in an unassigned state.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to