[
https://issues.apache.org/jira/browse/STORM-1560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15170084#comment-15170084
]
Jungtaek Lim edited comment on STORM-1560 at 2/26/16 11:24 PM:
---------------------------------------------------------------
Strange... Let's focus that Client seems to be closed but worker still call
send() to that Client.
There're methods which closes the Client...
1. mk-refresh-connection
mk-refresh-connection replaces cached-task->node+port first, and closes
remove-connections, so send() shouldn't be called to that Client.
2. Context.term()
It means that worker is in progress of shutdown, so eventually send() shouldn't
be called.
3. Connect.run()
Connect.run() only calls Client.close() when closing is true, which means that
either 1 or 2 should occur before this one.
Please comment new paths here which I'm missing.
was (Author: kabhwan):
Strange... Let's focus that Client seems to be closed but worker still call
sends() to that Client.
There're methods which closes the Client...
1. mk-refresh-connection
mk-refresh-connection replaces cached-task->node+port first, and closes
remove-connections, so send() shouldn't be called to that Client.
2. Context.term()
It means that worker is in progress of shutdown, so eventually send() shouldn't
be called.
3. Connect.run()
Connect.run() only calls Client.close() when closing is true, which means that
either 1 or 2 should occur before this one.
Please comment new paths here which I'm missing.
> Topology stops processing after Netty catches/swallows Throwable
> ----------------------------------------------------------------
>
> Key: STORM-1560
> URL: https://issues.apache.org/jira/browse/STORM-1560
> Project: Apache Storm
> Issue Type: Bug
> Components: storm-core
> Affects Versions: 1.0.0
> Reporter: P. Taylor Goetz
> Priority: Blocker
>
> In some scenarios, netty connection problems can leave a topology in an
> unrecoverable state. The likely culprit is the Netty {{HashedWheelTimer}}
> class that contains the following code:
> {code}
> public void expire() {
> if(this.compareAndSetState(0, 2)) {
> try {
> this.task.run(this);
> } catch (Throwable var2) {
> if(HashedWheelTimer.logger.isWarnEnabled()) {
> HashedWheelTimer.logger.warn("An exception was thrown
> by " + TimerTask.class.getSimpleName() + '.', var2);
> }
> }
> }
> }
> {code}
> The exception being swallowed can be seen below:
> {code}
> 2016-02-18 08:46:59.116 o.a.s.m.n.Client [INFO] closing Netty Client
> Netty-Client-/192.168.202.6:6701
> 2016-02-18 08:46:59.173 o.a.s.m.n.Client [INFO] waiting up to 600000 ms to
> send 0 pending messages to Netty-Client-/192.168.202.6:6701
> 2016-02-18 08:46:59.271 STDIO [ERROR] Feb 18, 2016 8:46:59 AM
> org.apache.storm.shade.org.jboss.netty.util.HashedWheelTimer
> WARNING: An exception was thrown by TimerTask.
> java.lang.RuntimeException: Giving up to scheduleConnect to
> Netty-Client-/192.168.202.6:6701 after 44 failed attempts. 3 messages were
> lost
> at org.apache.storm.messaging.netty.Client$Connect.run(Client.java:573)
> at
> org.apache.storm.shade.org.jboss.netty.util.HashedWheelTimer$HashedWheelTimeout.expire(HashedWheelTimer.java:546)
> at
> org.apache.storm.shade.org.jboss.netty.util.HashedWheelTimer$Worker.notifyExpiredTimeouts(HashedWheelTimer.java:446)
> at
> org.apache.storm.shade.org.jboss.netty.util.HashedWheelTimer$Worker.run(HashedWheelTimer.java:395)
> at
> org.apache.storm.shade.org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> The netty client then never recovers, and the follows messages repeat forever:
> {code}
> 2016-02-18 09:42:56.251 o.a.s.m.n.Client [ERROR] discarding 1 messages
> because the Netty client to Netty-Client-/192.168.202.6:6701 is being closed
> 2016-02-18 09:43:25.248 o.a.s.m.n.Client [ERROR] discarding 1 messages
> because the Netty client to Netty-Client-/192.168.202.6:6701 is being closed
> 2016-02-18 09:43:55.248 o.a.s.m.n.Client [ERROR] discarding 1 messages
> because the Netty client to Netty-Client-/192.168.202.6:6701 is being closed
> 2016-02-18 09:43:55.752 o.a.s.m.n.Client [ERROR] discarding 2 messages
> because the Netty client to Netty-Client-/192.168.202.6:6701 is being closed
> 2016-02-18 09:43:56.252 o.a.s.m.n.Client [ERROR] discarding 1 messages
> because the Netty client to Netty-Client-/192.168.202.6:6701 is being closed
> 2016-02-18 09:44:25.249 o.a.s.m.n.Client [ERROR] discarding 1 messages
> because the Netty client to Netty-Client-/192.168.202.6:6701 is being closed
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)