[ 
https://issues.apache.org/jira/browse/STORM-261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13944258#comment-13944258
 ] 

Jon logan commented on STORM-261:
---------------------------------

All storm services are designed to be fail fast and should be continuously 
managed by something like supervosord. The reason workers don't self destruct 
is because you would then depend on the supervisor being alive. Once scheduled, 
workers can survive a supervisor death and restart. 

This would require heartbeats from the supervisors and closing if missing or 
require workers to communicate with zookeeper looking for reassignments. 

Furthermore a worker cannot easily kill itself. Currently workers are killed 
externally through a kill 9 from the supervisor. This absolves any risk of non 
jvm termination due to thread leaks or such. 


I believe when the supervisor is restarted it should try to kill any 
rescheduled tasks on that machine while it was offline. 

> Workers should commit suicide if not scheduled any more.
> --------------------------------------------------------
>
>                 Key: STORM-261
>                 URL: https://issues.apache.org/jira/browse/STORM-261
>             Project: Apache Storm (Incubating)
>          Issue Type: Bug
>    Affects Versions: 0.9.2-incubating
>            Reporter: Robert Joseph Evans
>            Priority: Minor
>
> I know this is a bit far fetched.
> If for some reason a supervisor dies and does not come back up again, dead 
> HDD for example, but the workers remain up, and the scheduler decides to move 
> the worker to a new host, a rebalance for instance, the old workers will 
> never go away.  Ideally the worker should know that it is not running in the 
> correct place any more and die instead of waiting for the supervisor to kill 
> it.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to