[
https://issues.apache.org/jira/browse/STORM-946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14630868#comment-14630868
]
ASF GitHub Bot commented on STORM-946:
--------------------------------------
Github user caofangkun commented on a diff in the pull request:
https://github.com/apache/storm/pull/639#discussion_r34865678
--- Diff: storm-core/src/clj/backtype/storm/daemon/worker.clj ---
@@ -288,7 +288,18 @@
(filter-key (complement (-> worker
:task-ids set))))
needed-connections (-> needed-assignment vals set)
needed-tasks (-> needed-assignment keys)
-
+
+ closed-connections (into [] (for [[node+port socket]
@(:cached-node+port->socket worker)]
+ (if (.isClosed socket)
node+port)))
--- End diff --
-1
We need take attention to unneeded connections but not closed connections ,
sometimes closed connections need to be reconnected by [retry
policy](https://github.com/apache/storm/blob/master/storm-core/src/jvm/backtype/storm/messaging/netty/Client.java#L138)
to make sure reconnect losted connections.
[refresh-connections](https://github.com/apache/storm/blob/master/storm-core/src/clj/backtype/storm/daemon/worker.clj#L313)
will try remove unneeded connections
by the way:
```
java.lang.IllegalArgumentException: No matching field found: isClosed for
class backtype.storm.messaging.netty.Client
```
May be you should use
[client.status()](https://github.com/apache/storm/blob/master/storm-core/src/jvm/backtype/storm/messaging/netty/Client.java#L191)
somewhere when you need .
> We should remove Closed Client form cached-node+port->socket in worker
> ----------------------------------------------------------------------
>
> Key: STORM-946
> URL: https://issues.apache.org/jira/browse/STORM-946
> Project: Apache Storm
> Issue Type: Bug
> Affects Versions: 0.10.0, 0.11.0
> Reporter: xiajun
>
> The client may be Closed status after reconnect failed, and we will remove
> closed client from Context to escape memory leak.
> But there is also reference for the closed Client in cached-node+port->socket
> in worker, for this reason we should also remove closed Client from
> cached-node+port->socket.
> Meanwhile there is another reason for us to do so. Think about this
> situation: worker A connect to worker B1 B2, but for some reason worker B1 B2
> died at the same, then nimbus reschedule worker B1 B1. And new B1 B2 may
> partly rescheduled at the some host:port as old B1 B2, that is (old B1:
> host1+port1, old B2: host2+port2, new B1: host2+port2, new B2: host3+port3).
> Worker A realized worker B1 B2 died and start reconnect to worker B1 B2, but
> before new worker B1 and old B2 have the same host+port, and by the current
> logic, we will remove old B1 Client and and create new Client for new worker
> B2, and do nothing to old B2 and new B1 because they have the same host+port.
> This will result the topology stop processing tuples. Once we remove closed
> Client from cached-node+port->socket before refresh-connections, this will
> not happen again.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)