Jason Lowe created TEZ-3191:
-------------------------------

             Summary: NM container diagnostics for excess resource usage can be 
lost if task fails while being killed
                 Key: TEZ-3191
                 URL: https://issues.apache.org/jira/browse/TEZ-3191
             Project: Apache Tez
          Issue Type: Bug
    Affects Versions: 0.7.0
            Reporter: Jason Lowe


This is the Tez version of MAPREDUCE-4955.  I saw a misconfigured Tez job 
report a task attempt as failed due to a filesystem closed error because the NM 
killed the container due to excess memory usage.  Unfortunately the SIGTERM 
sent by the NM caused the filesystem shutdown hook to close the filesystems, 
and that triggered a failure in the main thread.  If the failure is reported to 
the AM via the umbilical before the NM container status is received via the RM 
then the useful container diagnostics from the NM are lost in the job history.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to