[
https://issues.apache.org/jira/browse/STORM-1667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15220099#comment-15220099
]
ASF GitHub Bot commented on STORM-1667:
---------------------------------------
Github user zhuoliu commented on a diff in the pull request:
https://github.com/apache/storm/pull/1280#discussion_r58081089
--- Diff: storm-core/src/clj/org/apache/storm/daemon/supervisor.clj ---
@@ -296,11 +293,15 @@
(worker-launcher-and-wait conf user ["signal" pid "9"] :log-prefix
(str "kill -9 " pid))
(force-kill-process pid))
(if as-user
- (rmr-as-user conf id (worker-pid-path conf id pid))
(try
+ (rmr-as-user conf id (worker-pid-path conf id pid))
(rmpath (worker-pid-path conf id pid))
(rmpath (worker-tmp-root conf id pid))
- (catch Exception e)))) ;; on windows, the supervisor may still
holds the lock on the worker directory
+ (catch IOException e
+ (log-warn-error e "Failed to cleanup pid dir: " pid " for
worker " id". Will retry later"))
+ (catch RuntimeException e
+ (log-warn-error e "Failed to cleanup pid dir: " pid " for
worker " id". Will retry later")))))
+ ;; on windows, the supervisor may still holds the lock on the
worker directory
--- End diff --
@satishd, got your point. It would be the responsibility of the one who
added new code to catch other possible exception, or they may prefer to not
catch and let it fail fast upon specific scenario.
But for now, since there is any non-deterministic exception might be thrown
in Windows environment, we may still ignore all types of exception, which might
also be why the previous code did this way ("catch Exception").
> Log the IO Exception when deleting worker pid dir
> -------------------------------------------------
>
> Key: STORM-1667
> URL: https://issues.apache.org/jira/browse/STORM-1667
> Project: Apache Storm
> Issue Type: Bug
> Components: storm-core
> Affects Versions: 1.0.0
> Reporter: Zhuo Liu
> Assignee: Zhuo Liu
> Fix For: 1.0.0
>
>
> There is a race condition since shutdown-worker is called by both
> sync-processes and synchronize-supervisor in supervisor. One pid directory
> might have been already deleted when rmr-as-user tries to delete it.
> Therefore, we should not let the FileNotFoundException get thrown and crash
> the supervisor.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)