[
https://issues.apache.org/jira/browse/AMQ-5543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14335690#comment-14335690
]
Tim Bain commented on AMQ-5543:
-------------------------------
[~jchristner], thanks for confirming the sequence of events that led to the
issue.
I think the broker should be more resilient in the face of running out of
threads (which is usually a transient problem that will eventually clear itself
up), but I think the priority for making that happen is probably pretty low, so
I don't expect this to be implemented anytime soon.
> Broker logs show repeated "java.lang.IllegalStateException: Timer already
> cancelled." lines
> --------------------------------------------------------------------------------------------
>
> Key: AMQ-5543
> URL: https://issues.apache.org/jira/browse/AMQ-5543
> Project: ActiveMQ
> Issue Type: Bug
> Components: Transport
> Affects Versions: 5.10.0
> Reporter: Tim Bain
>
> One of our brokers running 5.10.0 spewed over 1 million of the following
> exceptions to the log over a 2-hour period:
> Transport Connection to: tcp://127.0.0.1:xxxxx failed: java.io.IOException:
> Unexpected error occurred: java.lang.IllegalStateException: Timer already
> cancelled.
> Clients were observed to hang on startup (which we believe means they were
> unable to connect to the broker) until it was rebooted, after which we
> haven't seen the exception again.
> Once the exceptions started, there were no stack traces or other log lines
> that would indicate anything else about the cause, just those messages
> repeating. The problems started immediately (a few milliseconds) after an
> EOFException in the broker logs; we see those EOFExceptions pretty often and
> they've never before resulted in "Timer already cancelled" exceptions, so
> that might indicate what got us into a bad state but then again it might be
> entirely unrelated.
> I searched JIRA and the mailing list archives for similar issues, and
> although there are a lot of incidences of "Timer already cancelled"
> exceptions, none of them exactly match our situation.
> *
> http://activemq.2283324.n4.nabble.com/Producer-connections-keep-breaking-td4671152.html
> describes repeated copies of the line in the logs and is the closest
> parallel I've found, but it sounded like messages were still getting passed,
> albeit more slowly than normal, whereas the developer on my team who hit this
> said he didn't think any messages were getting sent. But it's the closest
> match of the group.
> * AMQ-5508 has a detailed investigation into the root cause of the problem
> that Pero Atanasov saw, but his scenario occurred only on broker shutdown,
> whereas our broker was not shutting down at the time, and it wasn't clear
> that the log line repeated for him.
> *
> http://activemq.2283324.n4.nabble.com/Timer-already-cancelled-and-KahaDB-Recovering-checkpoint-thread-after-death-td4676684.html
> has repeated messages like I'm seeing, but appears to be specific to KahaDB
> which I'm not using (we use non-persistent messages only).
> * AMQ-4805 has the same inner exception but not the full log line, and it
> shows a full stack trace whereas I only see the one line without the stack
> trace.
> *
> http://activemq.2283324.n4.nabble.com/can-t-send-message-Timer-already-cancelled-td4680175.html
> appears to see the exception on the producer when sending rather than on the
> broker.
> The only thing that we observed that might be related was that earlier in the
> broker's uptime, the developer who was running it ran up against his maxProc
> limit and wasn't able to create new native threads. That didn't result in
> any issues in the broker logs at the time (we only hit these issues several
> hours later), so I'm skeptical that it's related given that there are lots of
> other things that cause "Timer already cancelled" exceptions as evidenced by
> the partial list above, but it could be.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)