[ 
https://issues.apache.org/jira/browse/STORM-1560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15333566#comment-15333566
 ] 

Nico Meyer commented on STORM-1560:
-----------------------------------

Is it certain that this problem affects version 1.0.0? Because I observed the 
same problem  in 0.9.6 when workers restarted in short succession. That often 
happens if one worker crashes since nimbus tends to reassign tasks right around 
the time the crashed worker is restarted by the supervisor (most likely because 
by default both heartbeat timeouts are 30 second).
I had a patch that solved it, but it is no longer necessary for 1.0.0.

> Topology stops processing after Netty catches/swallows Throwable
> ----------------------------------------------------------------
>
>                 Key: STORM-1560
>                 URL: https://issues.apache.org/jira/browse/STORM-1560
>             Project: Apache Storm
>          Issue Type: Bug
>          Components: storm-core
>    Affects Versions: 1.0.0
>            Reporter: P. Taylor Goetz
>
> In some scenarios, netty connection problems can leave a topology in an 
> unrecoverable state. The likely culprit is the Netty {{HashedWheelTimer}} 
> class that contains the following code:
> {code}
>         public void expire() {
>             if(this.compareAndSetState(0, 2)) {
>                 try {
>                     this.task.run(this);
>                 } catch (Throwable var2) {
>                     if(HashedWheelTimer.logger.isWarnEnabled()) {
>                         HashedWheelTimer.logger.warn("An exception was thrown 
> by " + TimerTask.class.getSimpleName() + '.', var2);
>                     }
>                 }
>             }
>         }
> {code}
> The exception being swallowed can be seen below:
> {code}
> 2016-02-18 08:46:59.116 o.a.s.m.n.Client [INFO] closing Netty Client 
> Netty-Client-/192.168.202.6:6701
> 2016-02-18 08:46:59.173 o.a.s.m.n.Client [INFO] waiting up to 600000 ms to 
> send 0 pending messages to Netty-Client-/192.168.202.6:6701
> 2016-02-18 08:46:59.271 STDIO [ERROR] Feb 18, 2016 8:46:59 AM 
> org.apache.storm.shade.org.jboss.netty.util.HashedWheelTimer
> WARNING: An exception was thrown by TimerTask.
> java.lang.RuntimeException: Giving up to scheduleConnect to 
> Netty-Client-/192.168.202.6:6701 after 44 failed attempts. 3 messages were 
> lost
>       at org.apache.storm.messaging.netty.Client$Connect.run(Client.java:573)
>       at 
> org.apache.storm.shade.org.jboss.netty.util.HashedWheelTimer$HashedWheelTimeout.expire(HashedWheelTimer.java:546)
>       at 
> org.apache.storm.shade.org.jboss.netty.util.HashedWheelTimer$Worker.notifyExpiredTimeouts(HashedWheelTimer.java:446)
>       at 
> org.apache.storm.shade.org.jboss.netty.util.HashedWheelTimer$Worker.run(HashedWheelTimer.java:395)
>       at 
> org.apache.storm.shade.org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
>       at java.lang.Thread.run(Thread.java:745)
> {code}
> The netty client then never recovers, and the follows messages repeat forever:
> {code}
> 2016-02-18 09:42:56.251 o.a.s.m.n.Client [ERROR] discarding 1 messages 
> because the Netty client to Netty-Client-/192.168.202.6:6701 is being closed
> 2016-02-18 09:43:25.248 o.a.s.m.n.Client [ERROR] discarding 1 messages 
> because the Netty client to Netty-Client-/192.168.202.6:6701 is being closed
> 2016-02-18 09:43:55.248 o.a.s.m.n.Client [ERROR] discarding 1 messages 
> because the Netty client to Netty-Client-/192.168.202.6:6701 is being closed
> 2016-02-18 09:43:55.752 o.a.s.m.n.Client [ERROR] discarding 2 messages 
> because the Netty client to Netty-Client-/192.168.202.6:6701 is being closed
> 2016-02-18 09:43:56.252 o.a.s.m.n.Client [ERROR] discarding 1 messages 
> because the Netty client to Netty-Client-/192.168.202.6:6701 is being closed
> 2016-02-18 09:44:25.249 o.a.s.m.n.Client [ERROR] discarding 1 messages 
> because the Netty client to Netty-Client-/192.168.202.6:6701 is being closed
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to