[
https://issues.apache.org/jira/browse/MAPREDUCE-5877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13990268#comment-13990268
]
Hadoop QA commented on MAPREDUCE-5877:
--------------------------------------
{color:red}-1 overall{color}. Here are the results of testing the latest
attachment
http://issues.apache.org/jira/secure/attachment/12643432/mr-5877-1.patch
against trunk revision .
{color:red}-1 patch{color}. The patch command could not apply the patch.
Console output:
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4583//console
This message is automatically generated.
> Inconsistency between JT/TT for tasks taking a long time to launch
> ------------------------------------------------------------------
>
> Key: MAPREDUCE-5877
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5877
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Components: jobtracker, tasktracker
> Affects Versions: 1.2.1
> Reporter: Karthik Kambatla
> Assignee: Karthik Kambatla
> Priority: Critical
> Attachments: mr-5877-1.patch, repro-mr-5877.patch
>
>
> For the tasks that take too long to launch (for genuine reasons like large
> distributed caches), JT expires the task. Depending on whether job recovery
> is enabled and the JT's restart state, another attempt is launched or not
> even when the JT is not restarted. The status of the attempt changes to
> "Error launching task". Meanwhile, the TT is not informed of this task expiry
> and eventually launches the task. Also, the "new" attempt might be assigned
> to the same TT leading to more inconsistent behavior.
> To avoid this, one can bump up the mapred.tasktracker.expiry.interval, but
> leading to long TT failure discovery times.
> We should have a per-job timeout for task launches/ heartbeat and JT/TT
> should be consistent in what they say.
--
This message was sent by Atlassian JIRA
(v6.2#6252)