[jira] [Commented] (MAPREDUCE-5877) Inconsistency between JT/TT for tasks taking a long time to launch

Hadoop QA (JIRA) Mon, 05 May 2014 21:43:33 -0700

    [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13990268#comment-13990268
 ]


Hadoop QA commented on MAPREDUCE-5877:
--------------------------------------

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12643432/mr-5877-1.patch
  against trunk revision .

    {color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4583//console

This message is automatically generated.

> Inconsistency between JT/TT for tasks taking a long time to launch
> ------------------------------------------------------------------
>
>                 Key: MAPREDUCE-5877
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5877
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: jobtracker, tasktracker
>    Affects Versions: 1.2.1
>            Reporter: Karthik Kambatla
>            Assignee: Karthik Kambatla
>            Priority: Critical
>         Attachments: mr-5877-1.patch, repro-mr-5877.patch
>
>
> For the tasks that take too long to launch (for genuine reasons like large 
> distributed caches), JT expires the task. Depending on whether job recovery 
> is enabled and the JT's restart state, another attempt is launched or not 
> even when the JT is not restarted. The status of the attempt changes to 
> "Error launching task". Meanwhile, the TT is not informed of this task expiry 
> and eventually launches the task. Also, the "new" attempt might be assigned 
> to the same TT leading to more inconsistent behavior. 
> To avoid this, one can bump up the mapred.tasktracker.expiry.interval, but 
> leading to long TT failure discovery times. 
> We should have a per-job timeout for task launches/ heartbeat and JT/TT 
> should be consistent in what they say.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (MAPREDUCE-5877) Inconsistency between JT/TT for tasks taking a long time to launch

Reply via email to