[ 
https://issues.apache.org/jira/browse/STORM-946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14630728#comment-14630728
 ] 

ASF GitHub Bot commented on STORM-946:
--------------------------------------

GitHub user tedxia opened a pull request:

    https://github.com/apache/storm/pull/639

    STORM-946: We should remove Closed Client form cached-node+port->socket in 
worker

    Patch for [STORM-946](https://issues.apache.org/jira/browse/STORM-946)
    
    The client may be Closed status after reconnect failed, and we will remove 
closed client from Context to escape memory leak.
    But there is also reference for the closed Client in 
cached-node+port->socket in worker, for this reason we should also remove 
closed Client from cached-node+port->socket.
    Meanwhile there is another reason for us to do so. Think about this 
situation: worker A connect to worker B1 B2, but for some reason worker B1 B2 
died at the same, then nimbus reschedule worker B1 B1. And new B1 B2 may partly 
rescheduled at the some host:port as old B1 B2, that is (old B1: host1+port1, 
old B2: host2+port2, new B1: host2+port2, new B2: host3+port3). Worker A 
realized worker B1 B2 died and start reconnect to worker B1 B2, but before new 
worker B1 and old B2 have the same host+port, and by the current logic, we will 
remove old B1 Client and and create new Client for new worker B2, and do 
nothing to old B2 and new B1 because they have the same host+port. This will 
result the topology stop processing tuples. Once we remove closed Client from 
cached-node+port->socket before refresh-connections, this will not happen again.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/tedxia/storm 
ted-remove-closed-socket-from-cached-node+port-socket

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/storm/pull/639.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #639
    
----
commit 28c75bd3e6d9925c71acb8878c1d3786abfa0ba2
Author: xiajun <[email protected]>
Date:   2015-07-17T03:59:28Z

    STORM-946: We should remove Closed Client form cached-node+port->socket in 
worker

----


> We should remove Closed Client form cached-node+port->socket in worker
> ----------------------------------------------------------------------
>
>                 Key: STORM-946
>                 URL: https://issues.apache.org/jira/browse/STORM-946
>             Project: Apache Storm
>          Issue Type: Bug
>    Affects Versions: 0.10.0, 0.11.0
>            Reporter: xiajun
>
> The client may be Closed status after reconnect failed, and we will remove 
> closed client from Context to escape memory leak.
> But there is also reference for the closed Client in cached-node+port->socket 
> in worker, for this reason we should also remove closed Client from 
> cached-node+port->socket.  
> Meanwhile there is another reason for us to do so. Think about this 
> situation: worker A connect to worker B1 B2, but for some reason worker B1 B2 
> died at the same, then nimbus reschedule worker B1 B1. And new B1 B2 may 
> partly rescheduled at the some host:port as old B1 B2, that is (old B1: 
> host1+port1, old B2: host2+port2, new B1: host2+port2, new B2: host3+port3). 
> Worker A realized worker B1 B2 died and start reconnect to worker B1 B2, but 
> before new worker B1 and old B2 have the same host+port, and by the current 
> logic, we will remove old B1 Client and and create new Client for new worker 
> B2, and do nothing to old B2 and new B1 because they have the same host+port. 
> This will result the topology stop processing tuples. Once we remove closed 
> Client from cached-node+port->socket before refresh-connections, this  will 
> not happen again.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to