[ 
https://issues.apache.org/jira/browse/STORM-763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14570718#comment-14570718
 ] 

ASF GitHub Bot commented on STORM-763:
--------------------------------------

Github user miguno commented on the pull request:

    https://github.com/apache/storm/pull/568#issuecomment-108338846
  
    > @eshioji wrote:
    > Also I have a question, maybe @miguno could help; I've removed the 
graceful shutdown which tries to flush all pending message before the Client is 
closed, mostly to make it easier for me to fix the deadlock. However now I'm 
worried I might have removed something significant. Do you think I should bring 
it back?
    
    IIRC the graceful shutdown was primarily (but not exclusively) for 
non-acking topologies to minimize any potential data loss.  I think it would be 
preferable if we'd continue to allow for graceful shutdowns, if possible.


> nimbus reassigned worker A to another machine, but other worker's netty 
> client can't connect to the new worker A 
> -----------------------------------------------------------------------------------------------------------------
>
>                 Key: STORM-763
>                 URL: https://issues.apache.org/jira/browse/STORM-763
>             Project: Apache Storm
>          Issue Type: Bug
>    Affects Versions: 0.9.4
>         Environment: Debian 3.16.3-2~bpo70+1 (2014-09-21) x86_64 GNU/Linux
> java version "1.7.0_03"
> storm 0.9.4
> cluster 50+ machines
>            Reporter: 3in
>
> Debian 3.16.3-2~bpo70+1 (2014-09-21) x86_64 GNU/Linux
> java version "1.7.0_03"
> storm 0.9.4
> cluster 50+ machines
> my topology have 50+ worker, it can't emit  50000 thousand tuples in ten 
> minutes.
> sometimes one worker is reassigned to another machine by nimbus because of 
> task heartbeat timeout:
> 2015-04-08T16:51:23.026+0800 b.s.d.nimbus [INFO] Executor 
> my_topology-22-1428243953:[440 440] not alive
> 2015-04-08T16:51:23.026+0800 b.s.d.nimbus [INFO] Executor 
> my_topology-22-1428243953:[90 90] not alive
> 2015-04-08T16:51:23.026+0800 b.s.d.nimbus [INFO] Executor 
> my_topology-22-1428243953:[510 510] not alive
> 2015-04-08T16:51:23.026+0800 b.s.d.nimbus [INFO] Executor 
> my_topology-22-1428243953:[160 160] not alive
> i can see the reassigned worker is already started in storm UI,  but  other 
> worker write error log all the time:
> 2015-04-08T16:56:43.091+0800 b.s.m.n.Client [ERROR] dropping 1 message(s) 
> destined for Netty-Client-host_19/192.168.163.19:5700
> 2015-04-08T16:56:45.660+0800 b.s.m.n.Client [ERROR] connection to 
> Netty-Client-host_19/192.168.163.19:5700 is unavailable
> 2015-04-08T16:56:45.660+0800 b.s.m.n.Client [ERROR] dropping 1 message(s) 
> destined for Netty-Client-host_19/192.168.163.19:5700
> 2015-04-08T16:56:45.715+0800 b.s.m.n.Client [ERROR] connection to 
> Netty-Client-host_19/192.168.163.19:5700 is unavailable
> 2015-04-08T16:56:45.716+0800 b.s.m.n.Client [ERROR] dropping 1 message(s) 
> destined for Netty-Client-host_19/192.168.163.19:5700
> 2015-04-08T16:56:46.277+0800 b.s.m.n.Client [ERROR] connection to 
> Netty-Client-host_19/192.168.163.19:5700 is unavailable
> 2015-04-08T16:56:46.278+0800 b.s.m.n.Client [ERROR] dropping 1 message(s) 
> destined for Netty-Client-host_19/192.168.163.19:5700
> 2015-04-08T16:56:46.306+0800 b.s.m.n.Client [ERROR] connection to 
> Netty-Client-host_19/192.168.163.19:5700 is unavailable
> 2015-04-08T16:56:46.306+0800 b.s.m.n.Client [ERROR] dropping 1 message(s) 
> destined for Netty-Client-host_19/192.168.163.19:5700
> 2015-04-08T16:56:46.586+0800 b.s.m.n.Client [ERROR] connection to 
> Netty-Client-host_19/192.168.163.19:5700 is unavailable
> 2015-04-08T16:56:46.586+0800 b.s.m.n.Client [ERROR] dropping 1 message(s) 
> destined for Netty-Client-host_19/192.168.163.19:5700
> 2015-04-08T16:56:46.835+0800 b.s.m.n.Client [ERROR] connection to 
> Netty-Client-host_19/192.168.163.19:5700 is unavailable
> The worker of destined host is already started, and i can telnet 
> 192.168.163.19 5700.
> however, why the netty client can't connect to the ip:port?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to