[ 
https://issues.apache.org/jira/browse/STORM-946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14630903#comment-14630903
 ] 

ASF GitHub Bot commented on STORM-946:
--------------------------------------

Github user caofangkun commented on the pull request:

    https://github.com/apache/storm/pull/639#issuecomment-122203636
  
    @tedxia 
    I could only find  two situations ( am I right?) will change 
```client.status``` to ```closed``` by call ```(.close socket)```. and do not 
need reconnect at all
    1:  when [ 
reresh-connections](https://github.com/apache/storm/blob/master/storm-core/src/clj/backtype/storm/daemon/worker.clj#L314)
 remove unneeded connections  and wil remove connecion from ``` 
cached-node+port->socket ```
    2: when 
[shutdown](https://github.com/apache/storm/blob/master/storm-core/src/clj/backtype/storm/daemon/worker.clj#L462)
 worker 
    
    Does have any  situation will call  ```(.close socket)``` but not removed 
from  ``` cached-node+port->socket ``` ?
    
    
    
    
    
    
    
    



> We should remove Closed Client form cached-node+port->socket in worker
> ----------------------------------------------------------------------
>
>                 Key: STORM-946
>                 URL: https://issues.apache.org/jira/browse/STORM-946
>             Project: Apache Storm
>          Issue Type: Bug
>    Affects Versions: 0.10.0, 0.11.0
>            Reporter: xiajun
>
> The client may be Closed status after reconnect failed, and we will remove 
> closed client from Context to escape memory leak.
> But there is also reference for the closed Client in cached-node+port->socket 
> in worker, for this reason we should also remove closed Client from 
> cached-node+port->socket.  
> Meanwhile there is another reason for us to do so. Think about this 
> situation: worker A connect to worker B1 B2, but for some reason worker B1 B2 
> died at the same, then nimbus reschedule worker B1 B1. And new B1 B2 may 
> partly rescheduled at the some host:port as old B1 B2, that is (old B1: 
> host1+port1, old B2: host2+port2, new B1: host2+port2, new B2: host3+port3). 
> Worker A realized worker B1 B2 died and start reconnect to worker B1 B2, but 
> before new worker B1 and old B2 have the same host+port, and by the current 
> logic, we will remove old B1 Client and and create new Client for new worker 
> B2, and do nothing to old B2 and new B1 because they have the same host+port. 
> This will result the topology stop processing tuples. Once we remove closed 
> Client from cached-node+port->socket before refresh-connections, this  will 
> not happen again.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to