Derek Dagit created STORM-589:
---------------------------------
Summary: Suboptimal default worker hb timeouts for nimbus &
supervisor
Key: STORM-589
URL: https://issues.apache.org/jira/browse/STORM-589
Project: Apache Storm
Issue Type: Bug
Affects Versions: 0.9.2-incubating
Reporter: Derek Dagit
Priority: Minor
Both worker heartbeat timeouts for nimbus and supervisor are set to 30 seconds
by default:
https://github.com/apache/storm/blob/3bbdc166bda7fb1a39b6906eda40da9bc83d5d4c/conf/defaults.yaml#L58
https://github.com/apache/storm/blob/3bbdc166bda7fb1a39b6906eda40da9bc83d5d4c/conf/defaults.yaml#L118
This means that it is when a worker dies in relation to its heartbeats that
would determine whether the supervisor relaunches it or nimbus reassigns it.
If the supervisor heartbeat is found to have timed out first, it is relaunched.
If the nimbus heartbeat is found to have timed out first, it is rescheduled.
We may want the nimbus time-out to be larger than the supervisor time-out, to
give the supervisor a chance to relaunch the worker before nimbus re-assigns it.
As always, users administrating clusters are encouraged to set these as needed.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)