[jira] [Commented] (TEZ-3462) Task attempt failure during container shutdown loses useful container diagnostics

Eric Badger (JIRA) Tue, 17 Jan 2017 11:19:45 -0800

    [ 
https://issues.apache.org/jira/browse/TEZ-3462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15826649#comment-15826649
 ]


Eric Badger commented on TEZ-3462:
----------------------------------

[~hitesh], [~sseth], do you have any followup on this? I'm of a similar opinion 
to [~jlowe]'s that we don't want to advertise the shutdown failure as the final 
status of the task. Shutdown errors will still be logged, but it won't confuse 
the users (e.g. in the case that a killed task errors out closing I/O, etc.). 

> Task attempt failure during container shutdown loses useful container 
> diagnostics
> ---------------------------------------------------------------------------------
>
>                 Key: TEZ-3462
>                 URL: https://issues.apache.org/jira/browse/TEZ-3462
>             Project: Apache Tez
>          Issue Type: Bug
>    Affects Versions: 0.7.1
>            Reporter: Jason Lowe
>            Assignee: Eric Badger
>         Attachments: TEZ-3462.001.patch
>
>
> When a nodemanager kills a task attempt due to excessive memory usage it will 
> send a SIGTERM followed by a SIGKILL.  It also sends a useful diagnostic 
> message with the container completion event to the RM which will eventually 
> make it to the AM on a subsequent heartbeat.
> However if the JVM shutdown processing causes an error in the task (e.g.: 
> filesystem being closed by shutdown hook) then the task attempt can report a 
> failure before the useful NM diagnostic makes it to the AM.  The AM then 
> records some other error as the task failure reason, and by the time the 
> container completion status makes it to the AM it does not associate that 
> error with the task attempt and the useful information is lost.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TEZ-3462) Task attempt failure during container shutdown loses useful container diagnostics

Reply via email to