[
https://issues.apache.org/jira/browse/STORM-1560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15215907#comment-15215907
]
John Fang commented on STORM-1560:
----------------------------------
Connect.run() only calls Client.close() when closing is true. We see the
connections calls client.close by Connect.run() from the log. So we should call
the close() previously. Thus the log should print the "closing Netty Client {}"
when calls close() previously. [~ptgoetz] You can find the a little earlier
log about close().
> Topology stops processing after Netty catches/swallows Throwable
> ----------------------------------------------------------------
>
> Key: STORM-1560
> URL: https://issues.apache.org/jira/browse/STORM-1560
> Project: Apache Storm
> Issue Type: Bug
> Components: storm-core
> Affects Versions: 1.0.0
> Reporter: P. Taylor Goetz
>
> In some scenarios, netty connection problems can leave a topology in an
> unrecoverable state. The likely culprit is the Netty {{HashedWheelTimer}}
> class that contains the following code:
> {code}
> public void expire() {
> if(this.compareAndSetState(0, 2)) {
> try {
> this.task.run(this);
> } catch (Throwable var2) {
> if(HashedWheelTimer.logger.isWarnEnabled()) {
> HashedWheelTimer.logger.warn("An exception was thrown
> by " + TimerTask.class.getSimpleName() + '.', var2);
> }
> }
> }
> }
> {code}
> The exception being swallowed can be seen below:
> {code}
> 2016-02-18 08:46:59.116 o.a.s.m.n.Client [INFO] closing Netty Client
> Netty-Client-/192.168.202.6:6701
> 2016-02-18 08:46:59.173 o.a.s.m.n.Client [INFO] waiting up to 600000 ms to
> send 0 pending messages to Netty-Client-/192.168.202.6:6701
> 2016-02-18 08:46:59.271 STDIO [ERROR] Feb 18, 2016 8:46:59 AM
> org.apache.storm.shade.org.jboss.netty.util.HashedWheelTimer
> WARNING: An exception was thrown by TimerTask.
> java.lang.RuntimeException: Giving up to scheduleConnect to
> Netty-Client-/192.168.202.6:6701 after 44 failed attempts. 3 messages were
> lost
> at org.apache.storm.messaging.netty.Client$Connect.run(Client.java:573)
> at
> org.apache.storm.shade.org.jboss.netty.util.HashedWheelTimer$HashedWheelTimeout.expire(HashedWheelTimer.java:546)
> at
> org.apache.storm.shade.org.jboss.netty.util.HashedWheelTimer$Worker.notifyExpiredTimeouts(HashedWheelTimer.java:446)
> at
> org.apache.storm.shade.org.jboss.netty.util.HashedWheelTimer$Worker.run(HashedWheelTimer.java:395)
> at
> org.apache.storm.shade.org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> The netty client then never recovers, and the follows messages repeat forever:
> {code}
> 2016-02-18 09:42:56.251 o.a.s.m.n.Client [ERROR] discarding 1 messages
> because the Netty client to Netty-Client-/192.168.202.6:6701 is being closed
> 2016-02-18 09:43:25.248 o.a.s.m.n.Client [ERROR] discarding 1 messages
> because the Netty client to Netty-Client-/192.168.202.6:6701 is being closed
> 2016-02-18 09:43:55.248 o.a.s.m.n.Client [ERROR] discarding 1 messages
> because the Netty client to Netty-Client-/192.168.202.6:6701 is being closed
> 2016-02-18 09:43:55.752 o.a.s.m.n.Client [ERROR] discarding 2 messages
> because the Netty client to Netty-Client-/192.168.202.6:6701 is being closed
> 2016-02-18 09:43:56.252 o.a.s.m.n.Client [ERROR] discarding 1 messages
> because the Netty client to Netty-Client-/192.168.202.6:6701 is being closed
> 2016-02-18 09:44:25.249 o.a.s.m.n.Client [ERROR] discarding 1 messages
> because the Netty client to Netty-Client-/192.168.202.6:6701 is being closed
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)