[
https://issues.apache.org/jira/browse/STORM-763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14567350#comment-14567350
]
ASF GitHub Bot commented on STORM-763:
--------------------------------------
Github user revans2 commented on the pull request:
https://github.com/apache/storm/pull/568#issuecomment-107517980
Do you have any performance numbers around this change? This is on the
critical path, and I want to verify that there has been no regression around
this.
I am also a bit concerned about losing the pending metric. We really need
a good way to know if this connection is getting backed-up. Meaning the
network connection is slow but still up or is just simply saturated. We really
need an equivalent of pending for us to know how much data Netty has queued up
to send.
> nimbus reassigned worker A to another machine, but other worker's netty
> client can't connect to the new worker A
> -----------------------------------------------------------------------------------------------------------------
>
> Key: STORM-763
> URL: https://issues.apache.org/jira/browse/STORM-763
> Project: Apache Storm
> Issue Type: Bug
> Affects Versions: 0.9.4
> Environment: Debian 3.16.3-2~bpo70+1 (2014-09-21) x86_64 GNU/Linux
> java version "1.7.0_03"
> storm 0.9.4
> cluster 50+ machines
> Reporter: 3in
>
> Debian 3.16.3-2~bpo70+1 (2014-09-21) x86_64 GNU/Linux
> java version "1.7.0_03"
> storm 0.9.4
> cluster 50+ machines
> my topology have 50+ worker, it can't emit 50000 thousand tuples in ten
> minutes.
> sometimes one worker is reassigned to another machine by nimbus because of
> task heartbeat timeout:
> 2015-04-08T16:51:23.026+0800 b.s.d.nimbus [INFO] Executor
> my_topology-22-1428243953:[440 440] not alive
> 2015-04-08T16:51:23.026+0800 b.s.d.nimbus [INFO] Executor
> my_topology-22-1428243953:[90 90] not alive
> 2015-04-08T16:51:23.026+0800 b.s.d.nimbus [INFO] Executor
> my_topology-22-1428243953:[510 510] not alive
> 2015-04-08T16:51:23.026+0800 b.s.d.nimbus [INFO] Executor
> my_topology-22-1428243953:[160 160] not alive
> i can see the reassigned worker is already started in storm UI, but other
> worker write error log all the time:
> 2015-04-08T16:56:43.091+0800 b.s.m.n.Client [ERROR] dropping 1 message(s)
> destined for Netty-Client-host_19/192.168.163.19:5700
> 2015-04-08T16:56:45.660+0800 b.s.m.n.Client [ERROR] connection to
> Netty-Client-host_19/192.168.163.19:5700 is unavailable
> 2015-04-08T16:56:45.660+0800 b.s.m.n.Client [ERROR] dropping 1 message(s)
> destined for Netty-Client-host_19/192.168.163.19:5700
> 2015-04-08T16:56:45.715+0800 b.s.m.n.Client [ERROR] connection to
> Netty-Client-host_19/192.168.163.19:5700 is unavailable
> 2015-04-08T16:56:45.716+0800 b.s.m.n.Client [ERROR] dropping 1 message(s)
> destined for Netty-Client-host_19/192.168.163.19:5700
> 2015-04-08T16:56:46.277+0800 b.s.m.n.Client [ERROR] connection to
> Netty-Client-host_19/192.168.163.19:5700 is unavailable
> 2015-04-08T16:56:46.278+0800 b.s.m.n.Client [ERROR] dropping 1 message(s)
> destined for Netty-Client-host_19/192.168.163.19:5700
> 2015-04-08T16:56:46.306+0800 b.s.m.n.Client [ERROR] connection to
> Netty-Client-host_19/192.168.163.19:5700 is unavailable
> 2015-04-08T16:56:46.306+0800 b.s.m.n.Client [ERROR] dropping 1 message(s)
> destined for Netty-Client-host_19/192.168.163.19:5700
> 2015-04-08T16:56:46.586+0800 b.s.m.n.Client [ERROR] connection to
> Netty-Client-host_19/192.168.163.19:5700 is unavailable
> 2015-04-08T16:56:46.586+0800 b.s.m.n.Client [ERROR] dropping 1 message(s)
> destined for Netty-Client-host_19/192.168.163.19:5700
> 2015-04-08T16:56:46.835+0800 b.s.m.n.Client [ERROR] connection to
> Netty-Client-host_19/192.168.163.19:5700 is unavailable
> The worker of destined host is already started, and i can telnet
> 192.168.163.19 5700.
> however, why the netty client can't connect to the ip:port?
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)