Tim Bain created AMQ-5543:
-----------------------------

             Summary: Broker logs show repeated 
"java.lang.IllegalSt​ateException: Timer already cancelled." lines
                 Key: AMQ-5543
                 URL: https://issues.apache.org/jira/browse/AMQ-5543
             Project: ActiveMQ
          Issue Type: Bug
          Components: Transport
    Affects Versions: 5.10.0
            Reporter: Tim Bain


One of our brokers running 5.10.0 spewed over 1 million of the following 
exceptions to the log over a 2-hour period:

Transport Connection to: tcp://127.0.0.1:xxxxx failed: java.io.IOException: 
Unexpected error occurred: java.lang.IllegalStateException: Timer already 
cancelled.

Clients were observed to hang on startup (which we believe means they were 
unable to connect to the broker) until it was rebooted, after which we haven't 
seen the exception again.

Once the exceptions started, there were no stack traces or other log lines that 
would indicate anything else about the cause, just those messages repeating.  
The problems started immediately (a few milliseconds) after an EOFException in 
the broker logs; we see those EOFExceptions pretty often and they've never 
before resulted in "Timer already cancelled" exceptions, so that might indicate 
what got us into a bad state but then again it might be entirely unrelated.

I searched JIRA and the mailing list archives for similar issues, and although 
there are a lot of incidences of "Timer already cancelled" exceptions, none of 
them exactly match our situation.
* 
http://activemq.2283324.n4.nabble.com/Producer-connections-keep-breaking-td4671152.html
 describes repeated copies of the line in the logs and is the closest parallel 
I've found, but it sounded like messages were still getting passed, albeit more 
slowly than normal, whereas the developer on my team who hit this said he 
didn't think any messages were getting sent.  But it's the closest match of the 
group.
* AMQ-5508 has a detailed investigation into the root cause of the problem that 
Pero Atanasov saw, but his scenario occurred only on broker shutdown, whereas 
our broker was not shutting down at the time, and it wasn't clear that the log 
line repeated for him.
* 
http://activemq.2283324.n4.nabble.com/Timer-already-cancelled-and-KahaDB-Recovering-checkpoint-thread-after-death-td4676684.html
 has repeated messages like I'm seeing, but appears to be specific to KahaDB 
which I'm not using (we use non-persistent messages only).
* AMQ-4805 has the same inner exception but not the full log line, and it shows 
a full stack trace whereas I only see the one line without the stack trace.
* 
http://activemq.2283324.n4.nabble.com/can-t-send-message-Timer-already-cancelled-td4680175.html
 appears to see the exception on the producer when sending rather than on the 
broker.

The only thing that we observed that might be related was that earlier in the 
broker's uptime, the developer who was running it ran up against his maxProc 
limit and wasn't able to create new native threads.  That didn't result in any 
issues in the broker logs at the time (we only hit these issues several hours 
later), so I'm skeptical that it's related given that there are lots of other 
things that cause "Timer already cancelled" exceptions as evidenced by the 
partial list above, but it could be.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to