[
https://issues.apache.org/jira/browse/STORM-1155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15015233#comment-15015233
]
ASF GitHub Bot commented on STORM-1155:
---------------------------------------
Github user hustfxj commented on a diff in the pull request:
https://github.com/apache/storm/pull/849#discussion_r45436295
--- Diff: storm-core/src/clj/backtype/storm/command/healthcheck.clj ---
@@ -0,0 +1,88 @@
+;; Licensed to the Apache Software Foundation (ASF) under one
+;; or more contributor license agreements. See the NOTICE file
+;; distributed with this work for additional information
+;; regarding copyright ownership. The ASF licenses this file
+;; to you under the Apache License, Version 2.0 (the
+;; "License"); you may not use this file except in compliance
+;; with the License. You may obtain a copy of the License at
+;;
+;; http://www.apache.org/licenses/LICENSE-2.0
+;;
+;; Unless required by applicable law or agreed to in writing, software
+;; distributed under the License is distributed on an "AS IS" BASIS,
+;; WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+;; See the License for the specific language governing permissions and
+;; limitations under the License.
+(ns backtype.storm.command.healthcheck
+ (:require [backtype.storm
+ [config :refer :all]
+ [log :refer :all]]
+ [clojure.java [io :as io]]
+ [clojure [string :refer [split]]])
+ (:gen-class))
+
+(defn interrupter
+ "Interrupt a given thread after ms milliseconds."
+ [thread ms]
+ (let [interrupter (Thread.
+ (fn []
+ (try
+ (Thread/sleep ms)
+ (.interrupt thread)
+ (catch InterruptedException e))))]
+ (.start interrupter)
+ interrupter))
+
+(defn check-output [lines]
+ (if (some #(.startsWith % "ERROR") lines)
+ :failed
+ :success))
+
+(defn process-script [conf script]
+ (let [script-proc (. (Runtime/getRuntime) (exec script))
+ curthread (Thread/currentThread)
+ interrupter-thread (interrupter curthread
+ (conf
STORM-HEALTH-CHECK-TIMEOUT-MS))]
+ (try
+ (.waitFor script-proc)
+ (.interrupt interrupter-thread)
--- End diff --
@revans2 If script-proc is blocked,then throw InterruptedException and
println "Script" script "timed out.".But the script-proc isn't really stop.Like
that:
admin 12755 1 0 12:49 pts/0 00:00:00 /bin/sh
/home/admin/test/healthCheck.sh
admin 12978 1 0 12:50 pts/0 00:00:00 /bin/sh
/home/admin/test/healthCheck.sh
admin 13228 1 0 12:51 pts/0 00:00:00 /bin/sh
/home/admin/test/healthCheck.sh
admin 13504 1 0 12:52 pts/0 00:00:00 /bin/sh
/home/admin/test/healthCheck.sh
admin 13644 13465 0 12:52 pts/0 00:00:00 /bin/sh
/home/admin/test/healthCheck.sh
Maybe we can stop the process ?
(defn interrupter
+ "Interrupt a given thread after ms milliseconds."
+ [script-proc ms]
+ (let [interrupter (Thread.
+ (fn []
+ (try
+ (Thread/sleep ms)
+ (.destory script-proc)
+ (catch InterruptedException e))))]
+ (.start interrupter)
+ interrupter))
> Supervisor recurring health checks
> ----------------------------------
>
> Key: STORM-1155
> URL: https://issues.apache.org/jira/browse/STORM-1155
> Project: Apache Storm
> Issue Type: Improvement
> Components: storm-core
> Reporter: Thomas Graves
> Assignee: Thomas Graves
> Fix For: 0.11.0
>
>
> Add the ability for the supervisor to call out to health check scripts to
> allow some validation of the health of the node the supervisor is running on.
> It could regularly run scripts in a directory provided by the cluster admin.
> If any scripts fail, it should kill the workers and stop itself.
> This could work very much like the Hadoop scripts and if ERROR is returned on
> stdout it means the node has some issue and we should shut down.
> If a non-zero exit code is returned it indicates that the scripts failed to
> execute properly so you don't want to mark the node as unhealthy.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)