Github user revans2 commented on the pull request:
https://github.com/apache/incubator-storm/pull/143#issuecomment-46302266
I do like combining the two shutdown hooks together.
I understand what is happening now with the leaked processes now. I was
confused about exactly what the code was supposed to be doing.
I thought that the supervisor would send a sig term to the process, and
then wait a while and send a sig kill to it. I didn't realize that instead it
was just sending a sigterm, and letting the worker send the sigkill to itself.
The issue with not having the supervisor force-kill the child is that a bug
in the worker, or in a child process the worker forks, could result in the
process being leaked.
I don't want to do the sleep right after the sigterm, because if there are
multiple workers the sleeps will add up. I think we want to modify
sync-processes in the supervisor to do 2 passes over the workers that need to
be killed. The first pass would ask the worker to exit (sigterm). The second
pass would force-kill the worker and cleanup the directories associated with
it. There could be a 1 second sleep in between if any workers were being
killed (I don't want to sleep if no workers are shutting down).
Does that sound reasonable?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---