Pero Atanasov created AMQ-5508:
----------------------------------
Summary: Broker shutdown race condition leads to
"IllegalStateException: Timer already cancelled"
Key: AMQ-5508
URL: https://issues.apache.org/jira/browse/AMQ-5508
Project: ActiveMQ
Issue Type: Bug
Components: Broker
Affects Versions: 5.10.0
Reporter: Pero Atanasov
Priority: Minor
This issue is seen on the surface with the broker failing to remove a consumer:
2014-12-19 10:45:42,586 | WARN | Failed to remove consumer:
ID:hack-34927-1419007505006-2:3:0:0 |
org.apache.activemq.broker.TransportConnection | ActiveMQ
BrokerService[broker0] Task-6
java.lang.IllegalStateException: Timer already cancelled.
The full stack trace [STACK_TRACE] will be specified at the end of this report.
In this case, the race condition is exposed via some of the AdvisoryBroker code
while trying to send an advisory message as part of the consumer/producer
addition/removal code. Below are excerpts of code to explain this race
condition:
activemq-broker/src/main/java/org/apache/activemq/advisory/AdvisoryBroker.java
Lines 605 - 637 [in fireAdvisory(...)]
if (getBrokerService().isStarted()) {
// code to create and populate an advisory message
next.send(producerExchange, advisoryMessage);
// send will trigger destination creation/addition for this advisory
message topic
}
However, it can easily happen that a client (consumer or producer) connects and
is added _before_ getBrokerService().isStarted() is true, so in that case, the
boolean expression in the block above will evaluate to false and hence skip the
entire advisory message send and create/add destination process. At this point,
the consumer/producer destination does not have its advisory message
destination created. If no other consumer/producer connects on the same
destination until the broker is shutting down, then no advisory destination
pairing will be created for that consumer/producer. When the broker starts the
shutdown process, the event that will trigger an advisory message will be the
removal of a consumer/producer as part of the TransportConnection stopping
process. The sequence of calls with line numbers can be seen within the
STACK_TRACE.
Back to the above specified code block, when the broker is shutting down, the
broker is well started. Then, the boolean expression will evaluate to true and
hence proceed with the advisory message creation/sending and destination
cration/addition process. However, in the meantime, as part of the shutdown
process, the Scheduler timer is being cancelled. Consider the following blocks
of code:
activemq-broker/src/main/java/org/apache/activemq/broker/BrokerService.java
Lines 756 - 761 [in stop()]
// for each service
stopper.stop(service);
activemq-client/src/main/java/org/apache/activemq/util/ServiceSupport.java
Lines 67 - 84 [in stop()]
if (stopped.compareAndSet(false, true)) {
... some code ...
doStop(stopper);
... some code ...
}
activemq-client/src/main/java/org/apache/activemq/thread/Scheduler.java
Lines 69 - 71 [in doStop(...)]
if (this.timer != null) {
this.timer.cancel();
}
At this point, the timer is cancelled. Now, back to the thread removing the
consumer. From the STACK_TRACE, it can be seen that after many calls related to
the advisory message destination creation, it ends up in AbstractRegion’s
addDestination method.
activemq-broker/src/main/java/org/apache/activemq/broker/region/AbstractRegion.java
Lines 132 - 145 [in addDestination(...)]
if (dest == null) {
... some code ...
dest = createDestination(context, destination);
... some code ...
dest.start();
... some code ...
}
As it can be seen in the STACK_TRACE below, dest.start() call will trigger the
need for a timer to be scheduled on the same timer that has already been
cancelled. Some debug prints to confirm the race condition:
1. Fail to send advisory message for this consumer advisory topic because
getBrokerService().isStarted() is false:
2015-01-06 08:49:26,215 | INFO | patanasov: from AdvisoryBroker fireAdvisory
would have sent advisoryMessage: topic
ActiveMQ.Advisory.Consumer.Queue.2015.01.06-08.23.00.16401 |
org.apache.activemq.advisory.AdvisoryBroker
2. Broker shutting down and cancelling timer:
2015-01-06 08:49:43,602 | INFO | Apache ActiveMQ 5.10.0 (broker0,
ID:host-40197-1420555763149-0:1) is shutting down |
org.apache.activemq.broker.BrokerService | ActiveMQ ShutdownHook
2015-01-06 08:49:43,603 | INFO | patanasov: from ServiceSupport calling
doStop(stopper) | org.apache.activemq.util.ServiceSupport | ActiveMQ
ShutdownHook
2015-01-06 08:49:43,603 | INFO | patanasov: from Scheduler
doStop(ServiceStopper stopper) calling this.timer.cancel(): timer
java.util.Timer@7df33bb0 | org.apache.activemq.thread.Scheduler | ActiveMQ
ShutdownHook
3. Adding destination and scheduling on cancelled timer (7df33bb0):
2015-01-06 08:49:43,767 | INFO | patanasov: broker0 adding destination:
qualified: topic://ActiveMQ.Advisory.Consumer.Queue.2015.01.06-08.23.00.16401,
physical: ActiveMQ.Advisory.Consumer.Queue.2015.01.06-08.23.00.16401 |
org.apache.activemq.broker.region.AbstractRegion | ActiveMQ
BrokerService[broker0] Task-12
2015-01-06 08:49:43,782 | INFO | patanasov: from schedualPeriodically: timer:
java.util.Timer@7df33bb0 | org.apache.activemq.thread.Scheduler | ActiveMQ
BrokerService[broker0] Task-12
2015-01-06 08:49:43,785 | INFO | patanasov: caught an exception in
schedualPeriodically: | org.apache.activemq.thread.Scheduler | ActiveMQ
BrokerService[broker0] Task-12
java.lang.IllegalStateException: Timer already cancelled.
I will provide a patch to fix this issue. However, the bigger concern here is
that advisory message destination creations fail for clients that connect while
getBrokerService().isStarted() is false. If there is only one consumer/producer
on one destination and if it connects while getBrokerService().isStarted() is
false, then it will fail to create/send advisory message for that destination,
at least until the producer/consumer is removed at the end. I will continue
looking to learn more about this state of the broker and open a separate JIRA
issue if needed.
STACK_TRACE:
java.lang.IllegalStateException: Timer already cancelled.
at java.util.Timer.sched(Timer.java:354)
at java.util.Timer.schedule(Timer.java:222)
at
org.apache.activemq.thread.Scheduler.schedualPeriodically(Scheduler.java:53)
at org.apache.activemq.broker.region.Topic.start(Topic.java:563)
at
org.apache.activemq.broker.region.AbstractRegion.addDestination(AbstractRegion.java:141)
at
org.apache.activemq.broker.region.RegionBroker.addDestination(RegionBroker.java:325)
at
org.apache.activemq.broker.BrokerFilter.addDestination(BrokerFilter.java:167)
at
org.apache.activemq.advisory.AdvisoryBroker.addDestination(AdvisoryBroker.java:184)
at
org.apache.activemq.broker.BrokerFilter.addDestination(BrokerFilter.java:167)
at
org.apache.activemq.broker.BrokerFilter.addDestination(BrokerFilter.java:167)
at
org.apache.activemq.broker.MutableBrokerFilter.addDestination(MutableBrokerFilter.java:172)
at
org.apache.activemq.broker.region.RegionBroker.send(RegionBroker.java:439)
at
org.apache.activemq.broker.jmx.ManagedRegionBroker.send(ManagedRegionBroker.java:297)
at
org.apache.activemq.advisory.AdvisoryBroker.fireAdvisory(AdvisoryBroker.java:615)
at
org.apache.activemq.advisory.AdvisoryBroker.fireConsumerAdvisory(AdvisoryBroker.java:564)
at
org.apache.activemq.advisory.AdvisoryBroker.fireConsumerAdvisory(AdvisoryBroker.java:550)
at
org.apache.activemq.advisory.AdvisoryBroker.removeConsumer(AdvisoryBroker.java:271)
at
org.apache.activemq.broker.BrokerFilter.removeConsumer(BrokerFilter.java:132)
at
org.apache.activemq.broker.BrokerFilter.removeConsumer(BrokerFilter.java:132)
at
org.apache.activemq.broker.MutableBrokerFilter.removeConsumer(MutableBrokerFilter.java:137)
at
org.apache.activemq.broker.TransportConnection.processRemoveConsumer(TransportConnection.java:651)
at
org.apache.activemq.broker.TransportConnection.processRemoveSession(TransportConnection.java:690)
at
org.apache.activemq.broker.TransportConnection.processRemoveConnection(TransportConnection.java:802)
at
org.apache.activemq.broker.TransportConnection.doStop(TransportConnection.java:1143)
at
org.apache.activemq.broker.TransportConnection$4.run(TransportConnection.java:1073)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:885)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907)
at java.lang.Thread.run(Thread.java:619)
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)