[ 
https://issues.apache.org/jira/browse/STORM-1934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15357252#comment-15357252
 ] 

ASF GitHub Bot commented on STORM-1934:
---------------------------------------

Github user harshach commented on a diff in the pull request:

    https://github.com/apache/storm/pull/1528#discussion_r69152524
  
    --- Diff: storm-core/src/clj/org/apache/storm/daemon/supervisor.clj ---
    @@ -425,6 +433,13 @@
              ". State: " state
              ", Heartbeat: " (pr-str heartbeat))
             (shutdown-worker supervisor id)))
    +
    +    (doseq [storm-id all-downloaded-storm-ids]
    --- End diff --
    
    This looks like you removed the check where we used to keep the jars even 
the assignement goes away. We had another issue where if the topology is 
rebalanced and the assignment goes away 
    and immediately comes back to the same supervisor , it tries re-download 
the jar and if it happens to be a large jar file than there are chances that 
supervisor worker gets killed before it even starts as the jar download takes 
longer time.
    IMO , i don't see a reason to remove the storm jars on a rebalance or if 
the assignement goes away. They should be removed when the topology gets killed.


> Race condition between sync-supervisor and sync-processes raises several 
> strange issues
> ---------------------------------------------------------------------------------------
>
>                 Key: STORM-1934
>                 URL: https://issues.apache.org/jira/browse/STORM-1934
>             Project: Apache Storm
>          Issue Type: Bug
>          Components: storm-core
>    Affects Versions: 1.0.0, 2.0.0, 1.0.1
>            Reporter: Jungtaek Lim
>            Assignee: Jungtaek Lim
>            Priority: Critical
>
> There're some strange issues including STORM-1933 and others (which I will 
> file an issue soon) which are related to race condition in supervisor.
> As I mentioned to STORM-1933, basically sync-supervisor relies on zk 
> assignment, and sync-processes relies on local assignment and local workers 
> directory, but in fact sync-supervisor also access local state and take some 
> actions which affects sync-processes. And also Satish left the comment to 
> STORM-1933 describing other issue related to race condition and idea to fix 
> this which is same page on me.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to