Slava Andreyev created STORM-1946:
-------------------------------------

             Summary: ShellBolt.java - On busy system BoltHeartbeatTimerTask 
fires before setHeartbeat() is executed
                 Key: STORM-1946
                 URL: https://issues.apache.org/jira/browse/STORM-1946
             Project: Apache Storm
          Issue Type: Bug
          Components: storm-core
            Reporter: Slava Andreyev


When storm stars a large number of ShellBolt-s that consume a lot of CPU time 
to initialize, it creates a lot of contention between processes for CPU 
resource. That leads to 
[BoltHeartbeatTimerTask|https://github.com/apache/storm/blob/master/storm-core/src/jvm/org/apache/storm/task/ShellBolt.java#L142]
 being fired up after 1 second delay _before_ 
[setHeartbeat()|https://github.com/apache/storm/blob/master/storm-core/src/jvm/org/apache/storm/task/ShellBolt.java#L145]
 assigns initial value to 
[lastHeartbeatTimestamp|https://github.com/apache/storm/blob/master/storm-core/src/jvm/org/apache/storm/task/ShellBolt.java#L91]
 variable.
As a result when {{BoltHeartbeatTimeTask}} fires up for the first time, 
[getLastHeartbeat()|https://github.com/apache/storm/blob/master/storm-core/src/jvm/org/apache/storm/task/ShellBolt.java#L316]
 returns value of *0*. This in turn leads bolt to die with ["subprocess 
heartbeat 
timeout"|https://github.com/apache/storm/blob/master/storm-core/src/jvm/org/apache/storm/task/ShellBolt.java#L322]
 message.
The fix is to place {{setHeartBeat()}} _before_ {{BoltHeartbeatTimerTask}} is 
created. The patch for this is attached.





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to