[ 
https://issues.apache.org/jira/browse/STORM-513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14203833#comment-14203833
 ] 

ASF GitHub Bot commented on STORM-513:
--------------------------------------

Github user HeartSaVioR commented on the pull request:

    https://github.com/apache/storm/pull/286#issuecomment-62295379
  
    @itaifrenkel @clockfly 
    I agree @itaifrenkel because there're more implementations (but it doesn't 
exist on Storm project) on multilang.
    I saw php implementation of multilang (Sorry I can't remember where it is), 
and there could be more.
    If we let subprocess take care of many things, implementations should apply 
it and update.
    (Actually we already force them to apply this change because bolt has to 
treat heartbeat tuple. ;( )
    So we should consider trade-off, and maybe we should have documentation of 
multilang specification.
    
    AND changes of multilang protocol introduced on this PR should be 
documented when we announce to release 0.9.3.


> ShellBolt keeps sending heartbeats even when child process is hung
> ------------------------------------------------------------------
>
>                 Key: STORM-513
>                 URL: https://issues.apache.org/jira/browse/STORM-513
>             Project: Apache Storm
>          Issue Type: Bug
>    Affects Versions: 0.9.2-incubating
>         Environment: Linux: 2.6.32-431.11.2.el6.x86_64 (RHEL 6.5)
>            Reporter: Dan Blanchard
>            Priority: Blocker
>             Fix For: 0.9.3-rc2
>
>
> If I'm understanding everything correctly with how ShellBolts work, the Java 
> ShellBolt executor is the part of the topology that sends heartbeats back to 
> Nimbus to let it know that a particular multilang bolt is still alive.  The 
> problem with this is that if the multilang subprocess/bolt severely hangs 
> (i.e., it will not even respond to {{SIGALRM}} and the like), the Java 
> ShellBolt does not seem to notice or care. Simply having the tuple get 
> replayed when it times out will not suffice either, because the subprocess 
> will still be stuck.
> The most obvious way to handle this seem to be to add heartbeating to the 
> multilang protocol itself, so that the ShellBolt expects a message of some 
> kind every {{timeout}} seconds.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to