P. Taylor Goetz created STORM-1560:
--------------------------------------

             Summary: Topology stops processing after Netty catches/swallows 
Throwable
                 Key: STORM-1560
                 URL: https://issues.apache.org/jira/browse/STORM-1560
             Project: Apache Storm
          Issue Type: Bug
          Components: storm-core
    Affects Versions: 1.0.0
            Reporter: P. Taylor Goetz
            Priority: Blocker


In some scenarios, netty connection problems can leave a topology in an 
unrecoverable state. The likely culprit is the Netty {{HashedWheelTimer}} class 
that contains the following code:

{code}
        public void expire() {
            if(this.compareAndSetState(0, 2)) {
                try {
                    this.task.run(this);
                } catch (Throwable var2) {
                    if(HashedWheelTimer.logger.isWarnEnabled()) {
                        HashedWheelTimer.logger.warn("An exception was thrown 
by " + TimerTask.class.getSimpleName() + '.', var2);
                    }
                }
            }
        }
{code}

The exception being swallowed can be seen below:

{code}
2016-02-18 08:46:59.116 o.a.s.m.n.Client [INFO] closing Netty Client 
Netty-Client-/192.168.202.6:6701
2016-02-18 08:46:59.173 o.a.s.m.n.Client [INFO] waiting up to 600000 ms to send 
0 pending messages to Netty-Client-/192.168.202.6:6701
2016-02-18 08:46:59.271 STDIO [ERROR] Feb 18, 2016 8:46:59 AM 
org.apache.storm.shade.org.jboss.netty.util.HashedWheelTimer
WARNING: An exception was thrown by TimerTask.
java.lang.RuntimeException: Giving up to scheduleConnect to 
Netty-Client-/192.168.202.6:6701 after 44 failed attempts. 3 messages were lost
        at org.apache.storm.messaging.netty.Client$Connect.run(Client.java:573)
        at 
org.apache.storm.shade.org.jboss.netty.util.HashedWheelTimer$HashedWheelTimeout.expire(HashedWheelTimer.java:546)
        at 
org.apache.storm.shade.org.jboss.netty.util.HashedWheelTimer$Worker.notifyExpiredTimeouts(HashedWheelTimer.java:446)
        at 
org.apache.storm.shade.org.jboss.netty.util.HashedWheelTimer$Worker.run(HashedWheelTimer.java:395)
        at 
org.apache.storm.shade.org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
        at java.lang.Thread.run(Thread.java:745)
{code}

The netty client then never recovers, and the follows messages repeat forever:

{code}
2016-02-18 09:42:56.251 o.a.s.m.n.Client [ERROR] discarding 1 messages because 
the Netty client to Netty-Client-/192.168.202.6:6701 is being closed
2016-02-18 09:43:25.248 o.a.s.m.n.Client [ERROR] discarding 1 messages because 
the Netty client to Netty-Client-/192.168.202.6:6701 is being closed
2016-02-18 09:43:55.248 o.a.s.m.n.Client [ERROR] discarding 1 messages because 
the Netty client to Netty-Client-/192.168.202.6:6701 is being closed
2016-02-18 09:43:55.752 o.a.s.m.n.Client [ERROR] discarding 2 messages because 
the Netty client to Netty-Client-/192.168.202.6:6701 is being closed
2016-02-18 09:43:56.252 o.a.s.m.n.Client [ERROR] discarding 1 messages because 
the Netty client to Netty-Client-/192.168.202.6:6701 is being closed
2016-02-18 09:44:25.249 o.a.s.m.n.Client [ERROR] discarding 1 messages because 
the Netty client to Netty-Client-/192.168.202.6:6701 is being closed
{code}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to