STORM-3240 any non-zero exit code causes health check failure
Project: http://git-wip-us.apache.org/repos/asf/storm/repo Commit: http://git-wip-us.apache.org/repos/asf/storm/commit/0b32a295 Tree: http://git-wip-us.apache.org/repos/asf/storm/tree/0b32a295 Diff: http://git-wip-us.apache.org/repos/asf/storm/diff/0b32a295 Branch: refs/heads/master Commit: 0b32a2950c61814ec6f9a9d73a82242559bb003f Parents: 9e84142 Author: Aaron Gresch <agre...@yahoo-inc.com> Authored: Tue Oct 2 15:35:59 2018 -0500 Committer: Aaron Gresch <agre...@yahoo-inc.com> Committed: Tue Oct 2 15:35:59 2018 -0500 ---------------------------------------------------------------------- docs/Setting-up-a-Storm-cluster.md | 2 +- .../main/java/org/apache/storm/healthcheck/HealthChecker.java | 4 ++-- 2 files changed, 3 insertions(+), 3 deletions(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/storm/blob/0b32a295/docs/Setting-up-a-Storm-cluster.md ---------------------------------------------------------------------- diff --git a/docs/Setting-up-a-Storm-cluster.md b/docs/Setting-up-a-Storm-cluster.md index c4a637c..d770a58 100644 --- a/docs/Setting-up-a-Storm-cluster.md +++ b/docs/Setting-up-a-Storm-cluster.md @@ -92,7 +92,7 @@ drpc.servers: ["111.222.333.44"] ### Monitoring Health of Supervisors -Storm provides a mechanism by which administrators can configure the supervisor to run administrator supplied scripts periodically to determine if a node is healthy or not. Administrators can have the supervisor determine if the node is in a healthy state by performing any checks of their choice in scripts located in storm.health.check.dir. If a script detects the node to be in an unhealthy state, it must print a line to standard output beginning with the string ERROR and return a non-zero exit code. In pre-Storm 2.x releases, a bug considered a script exit value of 0 to be a failure. This has now been fixed. The supervisor will periodically run the scripts in the health check dir and check the output. If the scriptâs output contains the string ERROR, as described above, the supervisor will shut down any workers and exit. +Storm provides a mechanism by which administrators can configure the supervisor to run administrator supplied scripts periodically to determine if a node is healthy or not. Administrators can have the supervisor determine if the node is in a healthy state by performing any checks of their choice in scripts located in storm.health.check.dir. If a script detects the node to be in an unhealthy state, it must return a non-zero exit code. In pre-Storm 2.x releases, a bug considered a script exit value of 0 to be a failure. This has now been fixed. The supervisor will periodically run the scripts in the health check dir and check the output. If the scriptâs output contains the string ERROR, as described above, the supervisor will shut down any workers and exit. If the supervisor is running with supervision "/bin/storm node-health-check" can be called to determine if the supervisor should be launched or if the node is unhealthy. http://git-wip-us.apache.org/repos/asf/storm/blob/0b32a295/storm-server/src/main/java/org/apache/storm/healthcheck/HealthChecker.java ---------------------------------------------------------------------- diff --git a/storm-server/src/main/java/org/apache/storm/healthcheck/HealthChecker.java b/storm-server/src/main/java/org/apache/storm/healthcheck/HealthChecker.java index 38bcf64..b5f3655 100644 --- a/storm-server/src/main/java/org/apache/storm/healthcheck/HealthChecker.java +++ b/storm-server/src/main/java/org/apache/storm/healthcheck/HealthChecker.java @@ -107,10 +107,10 @@ public class HealthChecker { while ((str = reader.readLine()) != null) { if (str.startsWith("ERROR")) { LOG.warn("The healthcheck process {} exited with code {}", script, process.exitValue()); - return FAILED_WITH_EXIT_CODE; + return FAILED; } } - return SUCCESS; + return FAILED_WITH_EXIT_CODE; } return SUCCESS; } catch (InterruptedException | ClosedByInterruptException e) {