[jira] [Commented] (HIVE-16136) LLAP: Before SIGKILL and collect diagnostic information before daemon goes down

Prasanth Jayachandran (JIRA) Tue, 07 Mar 2017 12:04:04 -0800

    [ 
https://issues.apache.org/jira/browse/HIVE-16136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15900051#comment-15900051
 ]


Prasanth Jayachandran commented on HIVE-16136:
----------------------------------------------

bq. The shell scripts are probably where we can trap signals and dump 
/proc/<pid>/smaps & /proc/<pid>/stat ? Bash has a "trap" feature for this.

Yeah. We could get these as well. But I think this can only be triggered under 
OOM on error hook or other jvm fatal error (although I was not able to make 
this work with OnError hook + stack overflow exception). This won't work for 
SIGTERM or SIGKILL. 

bq. This is pretty easy to increase, but is cluster wide config.

If cluster wide config, then having shorter intervals will not be enough for 
full heap dump. We could add separate shutdown hooks to collect jstack, jmx, 
/proc/* etc. and let HeapDumpOnOOM handle heap dump. We can probably have a web 
endpoint for manual heapdump if it's useful. 



> LLAP: Before SIGKILL and collect diagnostic information before daemon goes 
> down
> -------------------------------------------------------------------------------
>
>                 Key: HIVE-16136
>                 URL: https://issues.apache.org/jira/browse/HIVE-16136
>             Project: Hive
>          Issue Type: Bug
>          Components: llap
>    Affects Versions: 2.2.0
>            Reporter: Prasanth Jayachandran
>            Assignee: Prasanth Jayachandran
>
> Sometime daemons can get killed by YARN's pmem monitor which issue a kill 
> followed by kill -9 after 250ms. This is really a short duration to collect 
> anything useful. 
> There is no clean way to trap SIGKILL in java.  
> One option is to increase the time between kill and kill -9 in YARN and 
> during that time we can have a shutdown hook handler to collect all 
> diagnostics information like heapdump, jstack, jmx output etc. in a 
> non-container directory.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HIVE-16136) LLAP: Before SIGKILL and collect diagnostic information before daemon goes down

Reply via email to