[ 
https://issues.apache.org/jira/browse/STORM-1667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15220069#comment-15220069
 ] 

ASF GitHub Bot commented on STORM-1667:
---------------------------------------

Github user satishd commented on a diff in the pull request:

    https://github.com/apache/storm/pull/1280#discussion_r58077961
  
    --- Diff: storm-core/src/clj/org/apache/storm/daemon/supervisor.clj ---
    @@ -296,11 +293,15 @@
             (worker-launcher-and-wait conf user ["signal" pid "9"] :log-prefix 
(str "kill -9 " pid))
             (force-kill-process pid))
           (if as-user
    -        (rmr-as-user conf id (worker-pid-path conf id pid))
             (try
    +          (rmr-as-user conf id (worker-pid-path conf id pid))
               (rmpath (worker-pid-path conf id pid))
               (rmpath (worker-tmp-root conf id pid))
    -          (catch Exception e)))) ;; on windows, the supervisor may still 
holds the lock on the worker directory
    +          (catch IOException e
    +            (log-warn-error e "Failed to cleanup pid dir: " pid " for 
worker " id". Will retry later"))
    +          (catch RuntimeException e
    +            (log-warn-error e "Failed to cleanup pid dir: " pid " for 
worker " id". Will retry later")))))
    +          ;; on windows, the supervisor may still holds the lock on the 
worker directory
    --- End diff --
    
    @zhuoliu IMHO,  `Exception` should be caught and logged and we should not 
really constrain with catching only `IOException` and `RuntimeException`. When 
any of those methods is changed to throw some other `Exception` then this code 
should also be changed to handle that, otherwise this code would throw that 
Exception and get into unexpected scenarios. That may be the reason why 
`Exception` is caught in earlier code.
    
     


> Log the IO Exception when deleting worker pid dir
> -------------------------------------------------
>
>                 Key: STORM-1667
>                 URL: https://issues.apache.org/jira/browse/STORM-1667
>             Project: Apache Storm
>          Issue Type: Bug
>          Components: storm-core
>    Affects Versions: 1.0.0
>            Reporter: Zhuo Liu
>            Assignee: Zhuo Liu
>             Fix For: 1.0.0
>
>
> There is a race condition since shutdown-worker is called by both 
> sync-processes and synchronize-supervisor in supervisor. One pid directory 
> might have been already deleted when rmr-as-user tries to delete it. 
> Therefore, we should not let the FileNotFoundException get thrown and crash 
> the supervisor.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to