[ 
https://issues.apache.org/jira/browse/STORM-763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14570530#comment-14570530
 ] 

ASF GitHub Bot commented on STORM-763:
--------------------------------------

Github user eshioji commented on the pull request:

    https://github.com/apache/storm/pull/568#issuecomment-108263716
  
    @revans2 Thanks for the performance testing, I could replicate very similar 
results on our machine. I'm trying a few things, hopefully I can get it back to 
where it was.
    
    In case you haven't noticed, this change includes fix for 
[STORM-839](https://github.com/apache/storm/pull/566) which seems pretty 
critical (I actually encountered a deadlock on my live cluster). Do you think 
the patch should be applied to 0.9.x if the performance concern is alleviated? 
Or are you more thinking of 0.11.x? (I noticed you tested with 0.11)
    
    Also I have a question, maybe @miguno could help; I've removed the graceful 
shutdown which tries to flush all pending message before the Client is closed, 
mostly to make it easier for me to fix the deadlock. However now I'm worried I 
might have removed something significant. Do you think I should bring it back?



> nimbus reassigned worker A to another machine, but other worker's netty 
> client can't connect to the new worker A 
> -----------------------------------------------------------------------------------------------------------------
>
>                 Key: STORM-763
>                 URL: https://issues.apache.org/jira/browse/STORM-763
>             Project: Apache Storm
>          Issue Type: Bug
>    Affects Versions: 0.9.4
>         Environment: Debian 3.16.3-2~bpo70+1 (2014-09-21) x86_64 GNU/Linux
> java version "1.7.0_03"
> storm 0.9.4
> cluster 50+ machines
>            Reporter: 3in
>
> Debian 3.16.3-2~bpo70+1 (2014-09-21) x86_64 GNU/Linux
> java version "1.7.0_03"
> storm 0.9.4
> cluster 50+ machines
> my topology have 50+ worker, it can't emit  50000 thousand tuples in ten 
> minutes.
> sometimes one worker is reassigned to another machine by nimbus because of 
> task heartbeat timeout:
> 2015-04-08T16:51:23.026+0800 b.s.d.nimbus [INFO] Executor 
> my_topology-22-1428243953:[440 440] not alive
> 2015-04-08T16:51:23.026+0800 b.s.d.nimbus [INFO] Executor 
> my_topology-22-1428243953:[90 90] not alive
> 2015-04-08T16:51:23.026+0800 b.s.d.nimbus [INFO] Executor 
> my_topology-22-1428243953:[510 510] not alive
> 2015-04-08T16:51:23.026+0800 b.s.d.nimbus [INFO] Executor 
> my_topology-22-1428243953:[160 160] not alive
> i can see the reassigned worker is already started in storm UI,  but  other 
> worker write error log all the time:
> 2015-04-08T16:56:43.091+0800 b.s.m.n.Client [ERROR] dropping 1 message(s) 
> destined for Netty-Client-host_19/192.168.163.19:5700
> 2015-04-08T16:56:45.660+0800 b.s.m.n.Client [ERROR] connection to 
> Netty-Client-host_19/192.168.163.19:5700 is unavailable
> 2015-04-08T16:56:45.660+0800 b.s.m.n.Client [ERROR] dropping 1 message(s) 
> destined for Netty-Client-host_19/192.168.163.19:5700
> 2015-04-08T16:56:45.715+0800 b.s.m.n.Client [ERROR] connection to 
> Netty-Client-host_19/192.168.163.19:5700 is unavailable
> 2015-04-08T16:56:45.716+0800 b.s.m.n.Client [ERROR] dropping 1 message(s) 
> destined for Netty-Client-host_19/192.168.163.19:5700
> 2015-04-08T16:56:46.277+0800 b.s.m.n.Client [ERROR] connection to 
> Netty-Client-host_19/192.168.163.19:5700 is unavailable
> 2015-04-08T16:56:46.278+0800 b.s.m.n.Client [ERROR] dropping 1 message(s) 
> destined for Netty-Client-host_19/192.168.163.19:5700
> 2015-04-08T16:56:46.306+0800 b.s.m.n.Client [ERROR] connection to 
> Netty-Client-host_19/192.168.163.19:5700 is unavailable
> 2015-04-08T16:56:46.306+0800 b.s.m.n.Client [ERROR] dropping 1 message(s) 
> destined for Netty-Client-host_19/192.168.163.19:5700
> 2015-04-08T16:56:46.586+0800 b.s.m.n.Client [ERROR] connection to 
> Netty-Client-host_19/192.168.163.19:5700 is unavailable
> 2015-04-08T16:56:46.586+0800 b.s.m.n.Client [ERROR] dropping 1 message(s) 
> destined for Netty-Client-host_19/192.168.163.19:5700
> 2015-04-08T16:56:46.835+0800 b.s.m.n.Client [ERROR] connection to 
> Netty-Client-host_19/192.168.163.19:5700 is unavailable
> The worker of destined host is already started, and i can telnet 
> 192.168.163.19 5700.
> however, why the netty client can't connect to the ip:port?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to