[
https://issues.apache.org/jira/browse/STORM-1667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15220069#comment-15220069
]
ASF GitHub Bot commented on STORM-1667:
---------------------------------------
Github user satishd commented on a diff in the pull request:
https://github.com/apache/storm/pull/1280#discussion_r58077961
--- Diff: storm-core/src/clj/org/apache/storm/daemon/supervisor.clj ---
@@ -296,11 +293,15 @@
(worker-launcher-and-wait conf user ["signal" pid "9"] :log-prefix
(str "kill -9 " pid))
(force-kill-process pid))
(if as-user
- (rmr-as-user conf id (worker-pid-path conf id pid))
(try
+ (rmr-as-user conf id (worker-pid-path conf id pid))
(rmpath (worker-pid-path conf id pid))
(rmpath (worker-tmp-root conf id pid))
- (catch Exception e)))) ;; on windows, the supervisor may still
holds the lock on the worker directory
+ (catch IOException e
+ (log-warn-error e "Failed to cleanup pid dir: " pid " for
worker " id". Will retry later"))
+ (catch RuntimeException e
+ (log-warn-error e "Failed to cleanup pid dir: " pid " for
worker " id". Will retry later")))))
+ ;; on windows, the supervisor may still holds the lock on the
worker directory
--- End diff --
@zhuoliu IMHO, `Exception` should be caught and logged and we should not
really constrain with catching only `IOException` and `RuntimeException`. When
any of those methods is changed to throw some other `Exception` then this code
should also be changed to handle that, otherwise this code would throw that
Exception and get into unexpected scenarios. That may be the reason why
`Exception` is caught in earlier code.
> Log the IO Exception when deleting worker pid dir
> -------------------------------------------------
>
> Key: STORM-1667
> URL: https://issues.apache.org/jira/browse/STORM-1667
> Project: Apache Storm
> Issue Type: Bug
> Components: storm-core
> Affects Versions: 1.0.0
> Reporter: Zhuo Liu
> Assignee: Zhuo Liu
> Fix For: 1.0.0
>
>
> There is a race condition since shutdown-worker is called by both
> sync-processes and synchronize-supervisor in supervisor. One pid directory
> might have been already deleted when rmr-as-user tries to delete it.
> Therefore, we should not let the FileNotFoundException get thrown and crash
> the supervisor.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)