Oleh Hordiichuk created STORM-1021:
--------------------------------------

             Summary: HeartbeatExecutorService issue in Shell Spout\Bolt
                 Key: STORM-1021
                 URL: https://issues.apache.org/jira/browse/STORM-1021
             Project: Apache Storm
          Issue Type: Bug
    Affects Versions: 0.10.0, 0.9.5
         Environment: Alpine Linux 3.2, openjdk-7
            Reporter: Oleh Hordiichuk
             Fix For: 0.10.0


ShellSpout class (and it seems that this touches ShellBolt as well) doesn't 
restart when hearbeat timeout occurs. To reproduce this bug you should do the 
following:
1. Set supervisor.worker.timeout.secs property to e.g. 1;
2. Create a shell spout (as a standalone application, not Java class) that 
hangs for more than 1 second and doesn't respond on heartbeat messages, e.g. 
Thread.Sleep(5000);
3. After timeout Storm will try to kill the shell spout process with calling 
die function:
https://github.com/apache/storm/blob/v0.10.0-beta/storm-core/src/jvm/backtype/storm/spout/ShellSpout.java#L237
4. The "die" function will call heartBeatExecutorService.shutdownNow() function 
that raises InterruptedException, which is not caughted by the calling thread. 
In a result topology stops working properly, however you may see it in ./storm 
list.

I'm not Java developer and thus I'm not sure whether code below is valid, 
however it seems to fix the problem:

    private void die(Throwable exception) {
        heartBeatExecutorService.shutdownNow();
        try {
            heartBeatExecutorService.awaitTermination(5, TimeUnit.SECONDS);
        } catch (InterruptedException e) {
            LOG.error("await catch ", e);
        }

        _collector.reportError(exception);
        _process.destroy();
        System.exit(11);
    }



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to