[
https://issues.apache.org/jira/browse/STORM-589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14241377#comment-14241377
]
Derek Dagit commented on STORM-589:
-----------------------------------
I may not have thought of a reason why the current defaults are good and not
necessarily suboptimal. Please do comment if that is the case.
> Suboptimal default worker hb timeouts for nimbus & supervisor
> -------------------------------------------------------------
>
> Key: STORM-589
> URL: https://issues.apache.org/jira/browse/STORM-589
> Project: Apache Storm
> Issue Type: Bug
> Affects Versions: 0.9.2-incubating
> Reporter: Derek Dagit
> Priority: Minor
>
> Both worker heartbeat timeouts for nimbus and supervisor are set to 30
> seconds by default:
> https://github.com/apache/storm/blob/3bbdc166bda7fb1a39b6906eda40da9bc83d5d4c/conf/defaults.yaml#L58
> https://github.com/apache/storm/blob/3bbdc166bda7fb1a39b6906eda40da9bc83d5d4c/conf/defaults.yaml#L118
> This means that it is when a worker dies in relation to its heartbeats that
> would determine whether the supervisor relaunches it or nimbus reassigns it.
> If the supervisor heartbeat is found to have timed out first, it is
> relaunched. If the nimbus heartbeat is found to have timed out first, it is
> rescheduled.
> We may want the nimbus time-out to be larger than the supervisor time-out, to
> give the supervisor a chance to relaunch the worker before nimbus re-assigns
> it.
> As always, users administrating clusters are encouraged to set these as
> needed.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)