Derek Dagit created STORM-589:
---------------------------------

             Summary: Suboptimal default worker hb timeouts for nimbus & 
supervisor
                 Key: STORM-589
                 URL: https://issues.apache.org/jira/browse/STORM-589
             Project: Apache Storm
          Issue Type: Bug
    Affects Versions: 0.9.2-incubating
            Reporter: Derek Dagit
            Priority: Minor


Both worker heartbeat timeouts for nimbus and supervisor are set to 30 seconds 
by default:

https://github.com/apache/storm/blob/3bbdc166bda7fb1a39b6906eda40da9bc83d5d4c/conf/defaults.yaml#L58

https://github.com/apache/storm/blob/3bbdc166bda7fb1a39b6906eda40da9bc83d5d4c/conf/defaults.yaml#L118

This means that it is when a worker dies in relation to its heartbeats that 
would determine whether the supervisor relaunches it or nimbus reassigns it.

If the supervisor heartbeat is found to have timed out first, it is relaunched. 
 If the nimbus heartbeat is found to have timed out first, it is rescheduled.

We may want the nimbus time-out to be larger than the supervisor time-out, to 
give the supervisor a chance to relaunch the worker before nimbus re-assigns it.

As always, users administrating clusters are encouraged to set these as needed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to