[
https://issues.apache.org/jira/browse/STORM-1946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15364673#comment-15364673
]
ASF GitHub Bot commented on STORM-1946:
---------------------------------------
GitHub user slava92 opened a pull request:
https://github.com/apache/storm/pull/1542
STORM-1946: initialize lastHeartbeatTimestamp before starting heartbeat task
When storm stars a large number of ShellBolt-s that consume a lot of CPU
time to initialize, it creates a lot of contention between processes for CPU
resource. That leads to BoltHeartbeatTimerTask being fired up after 1 second
delay before setHeartbeat() assigns initial value to lastHeartbeatTimestamp
variable.
As a result when BoltHeartbeatTimeTask fires up for the first time,
getLastHeartbeat() returns value of 0. This in turn leads bolt to die with
"subprocess heartbeat timeout" message.
The fix is to place setHeartBeat() before BoltHeartbeatTimerTask is
created. The patch for this is attached.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/slava92/storm master
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/storm/pull/1542.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #1542
----
commit d38c40026ab5dac266986896424e6c25a440e512
Author: Slava Andreyev <[email protected]>
Date: 2016-07-06T17:14:09Z
initialize lastHeartbeatTimestamp before starting heartbeat task
----
> ShellBolt.java - On busy system BoltHeartbeatTimerTask fires before
> setHeartbeat() is executed
> ----------------------------------------------------------------------------------------------
>
> Key: STORM-1946
> URL: https://issues.apache.org/jira/browse/STORM-1946
> Project: Apache Storm
> Issue Type: Bug
> Components: storm-core
> Reporter: Slava Andreyev
> Labels: patch
> Attachments: ShellBolt.java.patch
>
>
> When storm stars a large number of ShellBolt-s that consume a lot of CPU time
> to initialize, it creates a lot of contention between processes for CPU
> resource. That leads to
> [BoltHeartbeatTimerTask|https://github.com/apache/storm/blob/master/storm-core/src/jvm/org/apache/storm/task/ShellBolt.java#L142]
> being fired up after 1 second delay _before_
> [setHeartbeat()|https://github.com/apache/storm/blob/master/storm-core/src/jvm/org/apache/storm/task/ShellBolt.java#L145]
> assigns initial value to
> [lastHeartbeatTimestamp|https://github.com/apache/storm/blob/master/storm-core/src/jvm/org/apache/storm/task/ShellBolt.java#L91]
> variable.
> As a result when {{BoltHeartbeatTimeTask}} fires up for the first time,
> [getLastHeartbeat()|https://github.com/apache/storm/blob/master/storm-core/src/jvm/org/apache/storm/task/ShellBolt.java#L316]
> returns value of *0*. This in turn leads bolt to die with ["subprocess
> heartbeat
> timeout"|https://github.com/apache/storm/blob/master/storm-core/src/jvm/org/apache/storm/task/ShellBolt.java#L322]
> message.
> The fix is to place {{setHeartBeat()}} _before_ {{BoltHeartbeatTimerTask}} is
> created. The patch for this is attached.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)