David Judd created STORM-2439:
---------------------------------

             Summary: HealthCheck feature does not work
                 Key: STORM-2439
                 URL: https://issues.apache.org/jira/browse/STORM-2439
             Project: Apache Storm
          Issue Type: Bug
          Components: storm-core
    Affects Versions: 1.0.3
            Reporter: David Judd


There are a few issues with this feature:

1. The default timeout value produces `java.lang.ClassCastException: 
java.lang.Integer cannot be cast to java.lang.Long at 
org.apache.storm.command.HealthCheck.processScript(HealthCheck.java:79)` 
because the value, 5000, is automatically deserialized by Jackson as an 
Integer, but we attempt to cast it to a long. (I successfully worked around 
this by setting a timeout greater than the maximum int.)

2. The documentation says that a script should print "ERROR" if the node is 
unhealthy, but in fact the script must *also* exit with a non-zero exit code. 
This appears to be the opposite of what is intended, given a comment that says 
"We treat non-zero exit codes as indicators that the scripts failed to execute 
properly, not that the system is unhealthy". I believe the test in this line is 
inverted: 
https://github.com/apache/storm/blob/70102643e74d577728adf5f8719920d1bf60e98a/storm-core/src/jvm/org/apache/storm/command/HealthCheck.java#L97

3. Even with workarounds for the above two bugs, a failing health check does 
not cause workers to shut down in my testing with Storm 1.0.3. I have not 
determined the cause, but because the previous two issues suggest to me that 
this code is rarely if ever tested, I do not plan to investigate further at the 
moment.

If this feature is, as it appears, untested and non-functional, I would suggest 
that it be removed from the code and documentation.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to