[
https://issues.apache.org/jira/browse/STORM-532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14359847#comment-14359847
]
ASF GitHub Bot commented on STORM-532:
--------------------------------------
Github user caofangkun commented on a diff in the pull request:
https://github.com/apache/storm/pull/296#discussion_r26362192
--- Diff: storm-core/src/clj/backtype/storm/util.clj ---
@@ -392,6 +392,15 @@
(.addArgument command a))
(.execute (DefaultExecutor.) command)))
+(defn exists-process?
+ [process-id]
+ (let [line (if on-windows? (str "cmd /c \"tasklist /FI \"PID eq "
process-id "\" | findstr " process-id "\"" )
+ (str "ps -p " process-id))]
+ (try-cause
+ (exec-command! line)
--- End diff --
I prefer to use /proc/ directly ,
see:https://github.com/caofangkun/apache-storm/commit/2b2d65095402217d0217272b65ac9e1925125152#diff-60bc01aeb7fe37b1dc8ba418ebd627c3R379
but I am afraid that "/proc also does not exist on all unix variants" as
you have mentioned before.
> Supervisor should restart worker immediately, if the worker process does not
> exist any more
> --------------------------------------------------------------------------------------------
>
> Key: STORM-532
> URL: https://issues.apache.org/jira/browse/STORM-532
> Project: Apache Storm
> Issue Type: Improvement
> Affects Versions: 0.10.0
> Reporter: caofangkun
> Assignee: caofangkun
> Priority: Minor
>
> For now
> if the worker process does not exist any more
> Supervisor will have to wait a few seconds for worker heartbeart timeout and
> restart worker .
> If supervisor knows the worker processid and check if the process exists in
> the sync-processes thread ,may need less time to restart worker.
> 1: record worker process id in the worker local heartbeart
> 2: in supervisor sync-processes ,get process id from worker local heartbeat
> and check if the process exits
> 3: if not restart it immediately
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)