Tim Bain created AMQ-5543:
-----------------------------
Summary: Broker logs show repeated
"java.lang.IllegalStateException: Timer already cancelled." lines
Key: AMQ-5543
URL: https://issues.apache.org/jira/browse/AMQ-5543
Project: ActiveMQ
Issue Type: Bug
Components: Transport
Affects Versions: 5.10.0
Reporter: Tim Bain
One of our brokers running 5.10.0 spewed over 1 million of the following
exceptions to the log over a 2-hour period:
Transport Connection to: tcp://127.0.0.1:xxxxx failed: java.io.IOException:
Unexpected error occurred: java.lang.IllegalStateException: Timer already
cancelled.
Clients were observed to hang on startup (which we believe means they were
unable to connect to the broker) until it was rebooted, after which we haven't
seen the exception again.
Once the exceptions started, there were no stack traces or other log lines that
would indicate anything else about the cause, just those messages repeating.
The problems started immediately (a few milliseconds) after an EOFException in
the broker logs; we see those EOFExceptions pretty often and they've never
before resulted in "Timer already cancelled" exceptions, so that might indicate
what got us into a bad state but then again it might be entirely unrelated.
I searched JIRA and the mailing list archives for similar issues, and although
there are a lot of incidences of "Timer already cancelled" exceptions, none of
them exactly match our situation.
*
http://activemq.2283324.n4.nabble.com/Producer-connections-keep-breaking-td4671152.html
describes repeated copies of the line in the logs and is the closest parallel
I've found, but it sounded like messages were still getting passed, albeit more
slowly than normal, whereas the developer on my team who hit this said he
didn't think any messages were getting sent. But it's the closest match of the
group.
* AMQ-5508 has a detailed investigation into the root cause of the problem that
Pero Atanasov saw, but his scenario occurred only on broker shutdown, whereas
our broker was not shutting down at the time, and it wasn't clear that the log
line repeated for him.
*
http://activemq.2283324.n4.nabble.com/Timer-already-cancelled-and-KahaDB-Recovering-checkpoint-thread-after-death-td4676684.html
has repeated messages like I'm seeing, but appears to be specific to KahaDB
which I'm not using (we use non-persistent messages only).
* AMQ-4805 has the same inner exception but not the full log line, and it shows
a full stack trace whereas I only see the one line without the stack trace.
*
http://activemq.2283324.n4.nabble.com/can-t-send-message-Timer-already-cancelled-td4680175.html
appears to see the exception on the producer when sending rather than on the
broker.
The only thing that we observed that might be related was that earlier in the
broker's uptime, the developer who was running it ran up against his maxProc
limit and wasn't able to create new native threads. That didn't result in any
issues in the broker logs at the time (we only hit these issues several hours
later), so I'm skeptical that it's related given that there are lots of other
things that cause "Timer already cancelled" exceptions as evidenced by the
partial list above, but it could be.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)