Jason Lowe created TEZ-3072:
-------------------------------
Summary: Node blacklisting always reruns completed non-leaf tasks
Key: TEZ-3072
URL: https://issues.apache.org/jira/browse/TEZ-3072
Project: Apache Tez
Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Jason Lowe
Recently a user ran a job with many vertices, and there was a bug in the user's
code that caused a problem in one of the trailing vertices in the task. On
some nodes enough tasks failed that the AM thought it needed to blacklist those
nodes. That blacklisting then caused many completed vertices to re-run because
it thought it needed to re-execute the non-leaf tasks that had completed on
those nodes. This wasted a lot of cluster resources and job time for no
benefit.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)