[
https://issues.apache.org/jira/browse/TEZ-2954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15194355#comment-15194355
]
Siddharth Seth commented on TEZ-2954:
-------------------------------------
[~ozawa] - the problem is highlighted in
https://issues.apache.org/jira/browse/TEZ-925?focusedCommentId=13932292&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13932292
If we receive a container timeout - we would have received a task timeout as
well - which is factored in. The problem is that a launch failure on the NM
will be reported back via the RM. When that happens, we lose track of the fact
that the launch failed. If there's a timoue while talking to the NM - that will
register as a task failure.
The jira description should have been better.
> Container launch timeouts should count towards node blacklisting
> ----------------------------------------------------------------
>
> Key: TEZ-2954
> URL: https://issues.apache.org/jira/browse/TEZ-2954
> Project: Apache Tez
> Issue Type: Improvement
> Reporter: Siddharth Seth
> Assignee: Tsuyoshi Ozawa
> Attachments: TEZ-2954.001.patch
>
>
> Currently, only task failures count towards blacklisting. A container timing
> out should do the same.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)