[ 
https://issues.apache.org/jira/browse/TEZ-2954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15194355#comment-15194355
 ] 

Siddharth Seth commented on TEZ-2954:
-------------------------------------

[~ozawa] - the problem is highlighted in 
https://issues.apache.org/jira/browse/TEZ-925?focusedCommentId=13932292&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13932292

If we receive a container timeout - we would have received a task timeout as 
well - which is factored in. The problem is that a launch failure on the NM 
will be reported back via the RM. When that happens, we lose track of the fact 
that the launch failed. If there's a timoue while talking to the NM - that will 
register as a task failure.

The jira description should have been better.

> Container launch timeouts should count towards node blacklisting
> ----------------------------------------------------------------
>
>                 Key: TEZ-2954
>                 URL: https://issues.apache.org/jira/browse/TEZ-2954
>             Project: Apache Tez
>          Issue Type: Improvement
>            Reporter: Siddharth Seth
>            Assignee: Tsuyoshi Ozawa
>         Attachments: TEZ-2954.001.patch
>
>
> Currently, only task failures count towards blacklisting. A container timing 
> out should do the same.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to