[
https://issues.apache.org/jira/browse/AMQ-5508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Pero Atanasov updated AMQ-5508:
-------------------------------
Priority: Major (was: Minor)
> Broker shutdown race condition leads to "IllegalStateException: Timer already
> cancelled"
> ----------------------------------------------------------------------------------------
>
> Key: AMQ-5508
> URL: https://issues.apache.org/jira/browse/AMQ-5508
> Project: ActiveMQ
> Issue Type: Bug
> Components: Broker
> Affects Versions: 5.10.0
> Reporter: Pero Atanasov
> Attachments: AMQ5508.patch
>
>
> This issue is seen on the surface with the broker failing to remove a
> consumer:
> 2014-12-19 10:45:42,586 | WARN | Failed to remove consumer:
> ID:hack-34927-1419007505006-2:3:0:0 |
> org.apache.activemq.broker.TransportConnection | ActiveMQ
> BrokerService[broker0] Task-6
> java.lang.IllegalStateException: Timer already cancelled.
> The full stack trace [STACK_TRACE] will be specified at the end of this
> report.
> In this case, the race condition is exposed via some of the AdvisoryBroker
> code while trying to send an advisory message as part of the
> consumer/producer addition/removal code. Below are excerpts of code to
> explain this race condition:
> activemq-broker/src/main/java/org/apache/activemq/advisory/AdvisoryBroker.java
> Lines 605 - 637 [in fireAdvisory(...)]
> if (getBrokerService().isStarted()) {
> // code to create and populate an advisory message
> next.send(producerExchange, advisoryMessage);
> // send will trigger destination creation/addition for this advisory
> message topic
> }
> However, it can easily happen that a client (consumer or producer) connects
> and is added _before_ getBrokerService().isStarted() is true, so in that
> case, the boolean expression in the block above will evaluate to false and
> hence skip the entire advisory message send and create/add destination
> process. At this point, the consumer/producer destination does not have its
> advisory message destination created. If no other consumer/producer connects
> on the same destination until the broker is shutting down, then no advisory
> destination pairing will be created for that consumer/producer. When the
> broker starts the shutdown process, the event that will trigger an advisory
> message will be the removal of a consumer/producer as part of the
> TransportConnection stopping process. The sequence of calls with line numbers
> can be seen within the STACK_TRACE.
> Back to the above specified code block, when the broker is shutting down, the
> broker is still in the started state. Then, the boolean expression will
> evaluate to true and hence proceed with the advisory message creation/sending
> and destination creation/addition process. However, in the meantime, as part
> of the shutdown process, the Scheduler timer is being cancelled. Consider the
> following blocks of code:
> activemq-broker/src/main/java/org/apache/activemq/broker/BrokerService.java
> Lines 756 - 761 [in stop()]
> // for each service
> stopper.stop(service);
> activemq-client/src/main/java/org/apache/activemq/util/ServiceSupport.java
> Lines 67 - 84 [in stop()]
> if (stopped.compareAndSet(false, true)) {
> ... some code ...
> doStop(stopper);
> ... some code ...
> }
> activemq-client/src/main/java/org/apache/activemq/thread/Scheduler.java
> Lines 69 - 71 [in doStop(...)]
> if (this.timer != null) {
> this.timer.cancel();
> }
> At this point, the timer is cancelled. Now, back to the thread removing the
> consumer. From the STACK_TRACE, it can be seen that after many calls related
> to the advisory message destination creation, it ends up in AbstractRegion’s
> addDestination method.
> activemq-broker/src/main/java/org/apache/activemq/broker/region/AbstractRegion.java
> Lines 132 - 145 [in addDestination(...)]
> if (dest == null) {
> ... some code ...
> dest = createDestination(context, destination);
> ... some code ...
> dest.start();
> ... some code ...
> }
> As it can be seen in the STACK_TRACE below, dest.start() call will trigger
> the need for a timer to be scheduled on the same timer that has already been
> cancelled. Some debug prints to confirm the race condition:
> 1. Fail to send advisory message for this consumer advisory topic because
> getBrokerService().isStarted() is false:
> 2015-01-06 08:49:26,215 | INFO | patanasov: from AdvisoryBroker fireAdvisory
> would have sent advisoryMessage: topic
> ActiveMQ.Advisory.Consumer.Queue.2015.01.06-08.23.00.16401 |
> org.apache.activemq.advisory.AdvisoryBroker
> 2. Broker shutting down and cancelling timer:
> 2015-01-06 08:49:43,602 | INFO | Apache ActiveMQ 5.10.0 (broker0,
> ID:host-40197-1420555763149-0:1) is shutting down |
> org.apache.activemq.broker.BrokerService | ActiveMQ ShutdownHook
> 2015-01-06 08:49:43,603 | INFO | patanasov: from ServiceSupport calling
> doStop(stopper) | org.apache.activemq.util.ServiceSupport | ActiveMQ
> ShutdownHook
> 2015-01-06 08:49:43,603 | INFO | patanasov: from Scheduler
> doStop(ServiceStopper stopper) calling this.timer.cancel(): timer
> java.util.Timer@7df33bb0 | org.apache.activemq.thread.Scheduler | ActiveMQ
> ShutdownHook
> 3. Adding destination and scheduling on cancelled timer (7df33bb0):
> 2015-01-06 08:49:43,767 | INFO | patanasov: broker0 adding destination:
> qualified:
> topic://ActiveMQ.Advisory.Consumer.Queue.2015.01.06-08.23.00.16401, physical:
> ActiveMQ.Advisory.Consumer.Queue.2015.01.06-08.23.00.16401 |
> org.apache.activemq.broker.region.AbstractRegion | ActiveMQ
> BrokerService[broker0] Task-12
> 2015-01-06 08:49:43,782 | INFO | patanasov: from schedualPeriodically:
> timer: java.util.Timer@7df33bb0 | org.apache.activemq.thread.Scheduler |
> ActiveMQ BrokerService[broker0] Task-12
> 2015-01-06 08:49:43,785 | INFO | patanasov: caught an exception in
> schedualPeriodically: | org.apache.activemq.thread.Scheduler | ActiveMQ
> BrokerService[broker0] Task-12
> java.lang.IllegalStateException: Timer already cancelled.
> I will provide a patch to fix this issue. However, the bigger concern here is
> that advisory message destination creations fail for clients that connect
> while getBrokerService().isStarted() is false. If there is only one
> consumer/producer on one destination and if it connects while
> getBrokerService().isStarted() is false, then it will fail to create/send
> advisory message for that destination, at least until the producer/consumer
> is removed at the end. I will continue looking to learn more about this state
> of the broker and open a separate JIRA issue if needed.
> STACK_TRACE:
> java.lang.IllegalStateException: Timer already cancelled.
> at java.util.Timer.sched(Timer.java:354)
> at java.util.Timer.schedule(Timer.java:222)
> at
> org.apache.activemq.thread.Scheduler.schedualPeriodically(Scheduler.java:53)
> at org.apache.activemq.broker.region.Topic.start(Topic.java:563)
> at
> org.apache.activemq.broker.region.AbstractRegion.addDestination(AbstractRegion.java:141)
> at
> org.apache.activemq.broker.region.RegionBroker.addDestination(RegionBroker.java:325)
> at
> org.apache.activemq.broker.BrokerFilter.addDestination(BrokerFilter.java:167)
> at
> org.apache.activemq.advisory.AdvisoryBroker.addDestination(AdvisoryBroker.java:184)
> at
> org.apache.activemq.broker.BrokerFilter.addDestination(BrokerFilter.java:167)
> at
> org.apache.activemq.broker.BrokerFilter.addDestination(BrokerFilter.java:167)
> at
> org.apache.activemq.broker.MutableBrokerFilter.addDestination(MutableBrokerFilter.java:172)
> at
> org.apache.activemq.broker.region.RegionBroker.send(RegionBroker.java:439)
> at
> org.apache.activemq.broker.jmx.ManagedRegionBroker.send(ManagedRegionBroker.java:297)
> at
> org.apache.activemq.advisory.AdvisoryBroker.fireAdvisory(AdvisoryBroker.java:615)
> at
> org.apache.activemq.advisory.AdvisoryBroker.fireConsumerAdvisory(AdvisoryBroker.java:564)
> at
> org.apache.activemq.advisory.AdvisoryBroker.fireConsumerAdvisory(AdvisoryBroker.java:550)
> at
> org.apache.activemq.advisory.AdvisoryBroker.removeConsumer(AdvisoryBroker.java:271)
> at
> org.apache.activemq.broker.BrokerFilter.removeConsumer(BrokerFilter.java:132)
> at
> org.apache.activemq.broker.BrokerFilter.removeConsumer(BrokerFilter.java:132)
> at
> org.apache.activemq.broker.MutableBrokerFilter.removeConsumer(MutableBrokerFilter.java:137)
> at
> org.apache.activemq.broker.TransportConnection.processRemoveConsumer(TransportConnection.java:651)
> at
> org.apache.activemq.broker.TransportConnection.processRemoveSession(TransportConnection.java:690)
> at
> org.apache.activemq.broker.TransportConnection.processRemoveConnection(TransportConnection.java:802)
> at
> org.apache.activemq.broker.TransportConnection.doStop(TransportConnection.java:1143)
> at
> org.apache.activemq.broker.TransportConnection$4.run(TransportConnection.java:1073)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:885)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907)
> at java.lang.Thread.run(Thread.java:619)
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)