Slava Andreyev created STORM-1946:
-------------------------------------
Summary: ShellBolt.java - On busy system BoltHeartbeatTimerTask
fires before setHeartbeat() is executed
Key: STORM-1946
URL: https://issues.apache.org/jira/browse/STORM-1946
Project: Apache Storm
Issue Type: Bug
Components: storm-core
Reporter: Slava Andreyev
When storm stars a large number of ShellBolt-s that consume a lot of CPU time
to initialize, it creates a lot of contention between processes for CPU
resource. That leads to
[BoltHeartbeatTimerTask|https://github.com/apache/storm/blob/master/storm-core/src/jvm/org/apache/storm/task/ShellBolt.java#L142]
being fired up after 1 second delay _before_
[setHeartbeat()|https://github.com/apache/storm/blob/master/storm-core/src/jvm/org/apache/storm/task/ShellBolt.java#L145]
assigns initial value to
[lastHeartbeatTimestamp|https://github.com/apache/storm/blob/master/storm-core/src/jvm/org/apache/storm/task/ShellBolt.java#L91]
variable.
As a result when {{BoltHeartbeatTimeTask}} fires up for the first time,
[getLastHeartbeat()|https://github.com/apache/storm/blob/master/storm-core/src/jvm/org/apache/storm/task/ShellBolt.java#L316]
returns value of *0*. This in turn leads bolt to die with ["subprocess
heartbeat
timeout"|https://github.com/apache/storm/blob/master/storm-core/src/jvm/org/apache/storm/task/ShellBolt.java#L322]
message.
The fix is to place {{setHeartBeat()}} _before_ {{BoltHeartbeatTimerTask}} is
created. The patch for this is attached.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)