[ 
https://issues.apache.org/jira/browse/STORM-513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14163526#comment-14163526
 ] 

ASF GitHub Bot commented on STORM-513:
--------------------------------------

GitHub user HeartSaVioR opened a pull request:

    https://github.com/apache/storm/pull/286

    STORM-513 check heartbeat from multilang subprocess

    Related issue link : https://issues.apache.org/jira/browse/STORM-513
    
    It seems that ShellSpout and ShellBolt doesn't check subprocess, and set 
heartbeat with their only states.
    Subprocess could hang, but it doesn't affect ShellSpout / ShellBolt. It 
just stops working on tuple.
    It's better to check heartbeat from subprocess, and suicide if subprocess 
stops working.
    
    * Spout
      * ShellSpout sends "next" to subprocess continuously
      * subprocess sends "sync" to ShellSpout when "next" is received
      * so we can treat "sync", or any messages to heartbeat
    * Bolt
      * ShellBolt sends tuples to subprocess if it's available
      * so we need to send "heartbeat" tuple
       * subprocess sends "sync" to ShellBolt when "heartbeat" tuple is received

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/HeartSaVioR/storm STORM-513

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/storm/pull/286.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #286
    
----
commit ca5874cdf11af8d835335d228b643f28aeb3f9c3
Author: Jungtaek Lim <kabh...@gmail.com>
Date:   2014-10-08T13:48:01Z

    STORM-513 check heartbeat from multilang subprocess
    
    * Spout
    ** ShellSpout sends "next" to subprocess continuously
    ** subprocess sends "sync" to ShellSpout when "next" is received
    ** so we can treat "sync", or any messages to heartbeat
    * Bolt
    ** ShellBolt sends tuples to subprocess if it's available
    ** so we need to send "heartbeat" tuple
    ** subprocess sends "sync" to ShellBolt when "heartbeat" tuple is
    received

commit 1a0d4bdd735ba0ade42f6777a4c47affec931557
Author: Jungtaek Lim <kabh...@gmail.com>
Date:   2014-10-08T14:06:24Z

    Fix mixed tab / space, remove FIXME

----


> ShellBolt keeps sending heartbeats even when child process is hung
> ------------------------------------------------------------------
>
>                 Key: STORM-513
>                 URL: https://issues.apache.org/jira/browse/STORM-513
>             Project: Apache Storm
>          Issue Type: Bug
>         Environment: Linux: 2.6.32-431.11.2.el6.x86_64 (RHEL 6.5)
>            Reporter: Dan Blanchard
>            Priority: Blocker
>
> If I'm understanding everything correctly with how ShellBolts work, the Java 
> ShellBolt executor is the part of the topology that sends heartbeats back to 
> Nimbus to let it know that a particular multilang bolt is still alive.  The 
> problem with this is that if the multilang subprocess/bolt severely hangs 
> (i.e., it will not even respond to {{SIGALRM}} and the like), the Java 
> ShellBolt does not seem to notice or care. Simply having the tuple get 
> replayed when it times out will not suffice either, because the subprocess 
> will still be stuck.
> The most obvious way to handle this seem to be to add heartbeating to the 
> multilang protocol itself, so that the ShellBolt expects a message of some 
> kind every {{timeout}} seconds.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to