[
https://issues.apache.org/jira/browse/STORM-183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14033747#comment-14033747
]
ASF GitHub Bot commented on STORM-183:
--------------------------------------
Github user revans2 commented on the pull request:
https://github.com/apache/incubator-storm/pull/143#issuecomment-46302266
I do like combining the two shutdown hooks together.
I understand what is happening now with the leaked processes now. I was
confused about exactly what the code was supposed to be doing.
I thought that the supervisor would send a sig term to the process, and
then wait a while and send a sig kill to it. I didn't realize that instead it
was just sending a sigterm, and letting the worker send the sigkill to itself.
The issue with not having the supervisor force-kill the child is that a bug
in the worker, or in a child process the worker forks, could result in the
process being leaked.
I don't want to do the sleep right after the sigterm, because if there are
multiple workers the sleeps will add up. I think we want to modify
sync-processes in the supervisor to do 2 passes over the workers that need to
be killed. The first pass would ask the worker to exit (sigterm). The second
pass would force-kill the worker and cleanup the directories associated with
it. There could be a 1 second sleep in between if any workers were being
killed (I don't want to sleep if no workers are shutting down).
Does that sound reasonable?
> Supervisor/worker shutdown hook should be called in distributed mode.
> ---------------------------------------------------------------------
>
> Key: STORM-183
> URL: https://issues.apache.org/jira/browse/STORM-183
> Project: Apache Storm (Incubating)
> Issue Type: Bug
> Reporter: caofangkun
> Priority: Minor
> Attachments: STORM-183-1.patch
>
>
> if the process is killed forcefully from the OS or if it's crashing due to
> resource issues (e.g., out of memory), shutdown hooks won't be invoked.
> -TERM (15)
> The process is requested to stop running; it should try to exit cleanly
> -KILL (9)
> The process will be killed by the kernel; this signal cannot be ignored.
> So should we better use 'kill -15' ?
> See:
> https://github.com/apache/incubator-storm/blob/master/storm-core/src/clj/backtype/storm/util.clj#L392
> https://github.com/apache/incubator-storm/blob/master/storm-core/src/clj/backtype/storm/daemon/supervisor.clj#L175
> will never be called for supervisor:
> https://github.com/apache/incubator-storm/blob/master/storm-core/src/clj/backtype/storm/daemon/supervisor.clj#L396
> will never be called for worker:
> https://github.com/apache/incubator-storm/blob/master/storm-core/src/clj/backtype/storm/daemon/worker.clj#L421
> We'd better add something like :
> (.addShutdownHook (Runtime/getRuntime) (Thread. (fn [] (.shutdown mk-sv))))))
> ?
--
This message was sent by Atlassian JIRA
(v6.2#6252)