Jason Lowe created TEZ-3191:
-------------------------------
Summary: NM container diagnostics for excess resource usage can be
lost if task fails while being killed
Key: TEZ-3191
URL: https://issues.apache.org/jira/browse/TEZ-3191
Project: Apache Tez
Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Jason Lowe
This is the Tez version of MAPREDUCE-4955. I saw a misconfigured Tez job
report a task attempt as failed due to a filesystem closed error because the NM
killed the container due to excess memory usage. Unfortunately the SIGTERM
sent by the NM caused the filesystem shutdown hook to close the filesystems,
and that triggered a failure in the main thread. If the failure is reported to
the AM via the umbilical before the NM container status is received via the RM
then the useful container diagnostics from the NM are lost in the job history.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)