[ 
https://issues.apache.org/jira/browse/STORM-513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14212476#comment-14212476
 ] 

ASF GitHub Bot commented on STORM-513:
--------------------------------------

Github user harshach commented on the pull request:

    https://github.com/apache/storm/pull/286#issuecomment-63094779
  
    @HeartSaVioR  I am working on doing some tests on this PR. I tried to build 
the storm with your changes in and I am getting these failures. Can you please 
check if you see any of these issues. Thanks.
    
    java.lang.Exception: Shell Process Exception: Exception in bolt: undefined 
method `+' for nil:NilClass - tester_bolt.rb:29:in 
`process'\n/private/var/folders/yb/67h7c1sx2d95r5c_x5cjdwmh0000gp/T/ddda5ca6-8167-4ed1-bfef-a1a2001f65a2/supervisor/stormdist/test-1-1415984043/resources/storm.rb:186:in
 `run'\ntester_bolt.rb:37:in `<main>'
        at backtype.storm.task.ShellBolt.handleError(ShellBolt.java:188) 
[classes/:na]
        at backtype.storm.task.ShellBolt.access$1100(ShellBolt.java:69) 
[classes/:na]
        at 
backtype.storm.task.ShellBolt$BoltReaderRunnable.run(ShellBolt.java:331) 
[classes/:na]
        at java.lang.Thread.run(Thread.java:744) [na:1.7.0_51]
    90960 [Thread-1055] ERROR backtype.storm.task.ShellBolt - Halting process: 
ShellBolt died.
    java.lang.RuntimeException: backtype.storm.multilang.NoOutputException: 
Pipe to subprocess seems to be broken! No output read.
    Serializer Exception:
    
    
        at 
backtype.storm.utils.ShellProcess.readShellMsg(ShellProcess.java:101) 
~[classes/:na]
        at 
backtype.storm.task.ShellBolt$BoltReaderRunnable.run(ShellBolt.java:318) 
~[classes/:na]
        at java.lang.Thread.run(Thread.java:744) [na:1.7.0_51]



> ShellBolt keeps sending heartbeats even when child process is hung
> ------------------------------------------------------------------
>
>                 Key: STORM-513
>                 URL: https://issues.apache.org/jira/browse/STORM-513
>             Project: Apache Storm
>          Issue Type: Bug
>    Affects Versions: 0.9.2-incubating
>         Environment: Linux: 2.6.32-431.11.2.el6.x86_64 (RHEL 6.5)
>            Reporter: Dan Blanchard
>            Priority: Blocker
>             Fix For: 0.9.3-rc2
>
>
> If I'm understanding everything correctly with how ShellBolts work, the Java 
> ShellBolt executor is the part of the topology that sends heartbeats back to 
> Nimbus to let it know that a particular multilang bolt is still alive.  The 
> problem with this is that if the multilang subprocess/bolt severely hangs 
> (i.e., it will not even respond to {{SIGALRM}} and the like), the Java 
> ShellBolt does not seem to notice or care. Simply having the tuple get 
> replayed when it times out will not suffice either, because the subprocess 
> will still be stuck.
> The most obvious way to handle this seem to be to add heartbeating to the 
> multilang protocol itself, so that the ShellBolt expects a message of some 
> kind every {{timeout}} seconds.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to