[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ravi Gummadi updated MAPREDUCE-1028:
------------------------------------

    Attachment: MR-1028.patch

Attaching patch with the fix. Writing testcase is in progress.

> Cleanup tasks are scheduled using high memory configuration, leaving tasks in 
> unassigned state.
> -----------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1028
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1028
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: jobtracker
>    Affects Versions: 0.21.0
>            Reporter: Hemanth Yamijala
>            Assignee: Ravi Gummadi
>            Priority: Blocker
>             Fix For: 0.21.0
>
>         Attachments: MR-1028.patch
>
>
> A cleanup task is launched for a failed task of a job. This task is created 
> based on the TIP of the failed task, and so is marked as requiring as many 
> slots to run as the original task itself. For instance, if a high RAM job 
> requires 2 slots per task, a cleanup task of the high RAM jobs requires 2 
> slots as well.
> Further, a cleanup task is scheduled to a tasktracker by the jobtracker 
> itself and not the scheduler. While doing so, the JT doesn't check if the TT 
> has enough slots free to run a high RAM cleanup task - always assuming 1 slot 
> is enough. Thus, a task is oversubscribed to the TT.
> However, on the TT, before launch, we check that the task can actually run, 
> and wait for so many slots to become available. If the slots don't get freed 
> quickly, we will have tasks stuck in an unassigned state.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to