[ 
https://issues.apache.org/jira/browse/STORM-946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14630868#comment-14630868
 ] 

ASF GitHub Bot commented on STORM-946:
--------------------------------------

Github user caofangkun commented on a diff in the pull request:

    https://github.com/apache/storm/pull/639#discussion_r34865678
  
    --- Diff: storm-core/src/clj/backtype/storm/daemon/worker.clj ---
    @@ -288,7 +288,18 @@
                                           (filter-key (complement (-> worker 
:task-ids set))))
                   needed-connections (-> needed-assignment vals set)
                   needed-tasks (-> needed-assignment keys)
    -              
    +
    +              closed-connections (into [] (for [[node+port socket] 
@(:cached-node+port->socket worker)]
    +                                            (if (.isClosed socket) 
node+port)))
    --- End diff --
    
    -1 
    
    We need take attention to unneeded connections but not closed connections ,
    sometimes closed connections need to be reconnected by [retry 
policy](https://github.com/apache/storm/blob/master/storm-core/src/jvm/backtype/storm/messaging/netty/Client.java#L138)
 to make sure reconnect losted connections.
    
[refresh-connections](https://github.com/apache/storm/blob/master/storm-core/src/clj/backtype/storm/daemon/worker.clj#L313)
 will try remove unneeded connections
    
    by the way:
     
    ```  
    java.lang.IllegalArgumentException: No matching field found: isClosed for 
class  backtype.storm.messaging.netty.Client 
    ```
    May be you should use 
[client.status()](https://github.com/apache/storm/blob/master/storm-core/src/jvm/backtype/storm/messaging/netty/Client.java#L191)
 somewhere when you need .



> We should remove Closed Client form cached-node+port->socket in worker
> ----------------------------------------------------------------------
>
>                 Key: STORM-946
>                 URL: https://issues.apache.org/jira/browse/STORM-946
>             Project: Apache Storm
>          Issue Type: Bug
>    Affects Versions: 0.10.0, 0.11.0
>            Reporter: xiajun
>
> The client may be Closed status after reconnect failed, and we will remove 
> closed client from Context to escape memory leak.
> But there is also reference for the closed Client in cached-node+port->socket 
> in worker, for this reason we should also remove closed Client from 
> cached-node+port->socket.  
> Meanwhile there is another reason for us to do so. Think about this 
> situation: worker A connect to worker B1 B2, but for some reason worker B1 B2 
> died at the same, then nimbus reschedule worker B1 B1. And new B1 B2 may 
> partly rescheduled at the some host:port as old B1 B2, that is (old B1: 
> host1+port1, old B2: host2+port2, new B1: host2+port2, new B2: host3+port3). 
> Worker A realized worker B1 B2 died and start reconnect to worker B1 B2, but 
> before new worker B1 and old B2 have the same host+port, and by the current 
> logic, we will remove old B1 Client and and create new Client for new worker 
> B2, and do nothing to old B2 and new B1 because they have the same host+port. 
> This will result the topology stop processing tuples. Once we remove closed 
> Client from cached-node+port->socket before refresh-connections, this  will 
> not happen again.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to