Ma Zhechao created STORM-2150:

             Summary: ShellBolt raise subprocess heartbeat timeout Exception
                 Key: STORM-2150
             Project: Apache Storm
          Issue Type: Bug
          Components: storm-core, storm-multilang
    Affects Versions: 1.0.1, 1.0.2
            Reporter: Ma Zhechao
            Priority: Critical

I've got a simple topology running with Storm 1.0.1. The topology consists of a 
KafkaSpout and several python multilang ShellBolt. I frequently got the 
following exceptions. 

java.lang.RuntimeException: subprocess heartbeat timeout at 
at java.util.concurrent.Executors$ at 
java.util.concurrent.FutureTask.runAndReset( at 

More information here:
1. Topology run with ACK mode.
2. Topology had 40 workers.
3. Topology emitted about 10 milliom tuples every 10 minutes. 

Every time subprocess heartbeat timeout, workers would restart and python 
processes exited with exitCode:-1, which affected processing capacity and 
stability of the topology. 

I've checked some related issues from Storm Jira. I first found STORM-1946 
reported a bug related to this problem and said bug had been fixed in Storm 
1.0.2. However I got the same exception even after I upgraded Storm to 1.0.2.

I checked other related issues. Let's look at history of this problem.
DashengJu first reported this problem with Non-ACK mode in STORM-738. STORM-742 
discussed the approach of this problem with ACK mode, and it seemed that bug 
had been fixed in 0.10.0. I don't know whether this patch is included in 
storm-1.x branch. In a word, this problem still exists in the latest stable 

This message was sent by Atlassian JIRA

Reply via email to