[ 
https://issues.apache.org/jira/browse/AMQ-5508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pero Atanasov updated AMQ-5508:
-------------------------------
    Priority: Major  (was: Minor)

> Broker shutdown race condition leads to "IllegalStateException: Timer already 
> cancelled"
> ----------------------------------------------------------------------------------------
>
>                 Key: AMQ-5508
>                 URL: https://issues.apache.org/jira/browse/AMQ-5508
>             Project: ActiveMQ
>          Issue Type: Bug
>          Components: Broker
>    Affects Versions: 5.10.0
>            Reporter: Pero Atanasov
>         Attachments: AMQ5508.patch
>
>
> This issue is seen on the surface with the broker failing to remove a 
> consumer:
> 2014-12-19 10:45:42,586 | WARN  | Failed to remove consumer:
> ID:hack-34927-1419007505006-2:3:0:0 |
> org.apache.activemq.broker.TransportConnection | ActiveMQ
> BrokerService[broker0] Task-6
> java.lang.IllegalStateException: Timer already cancelled.
> The full stack trace [STACK_TRACE] will be specified at the end of this 
> report.
> In this case, the race condition is exposed via some of the AdvisoryBroker 
> code while trying to send an advisory message as part of the 
> consumer/producer addition/removal code. Below are excerpts of code to 
> explain this race condition:
> activemq-broker/src/main/java/org/apache/activemq/advisory/AdvisoryBroker.java
> Lines 605 - 637 [in fireAdvisory(...)]
> if (getBrokerService().isStarted()) {
>      // code to create and populate an advisory message
>      next.send(producerExchange, advisoryMessage);
>      // send will trigger destination creation/addition for this advisory 
> message topic
> }
> However, it can easily happen that a client (consumer or producer) connects 
> and is added _before_ getBrokerService().isStarted() is true, so in that 
> case, the boolean expression in the block above will evaluate to false and 
> hence skip the entire advisory message send and create/add destination 
> process. At this point, the consumer/producer destination does not have its 
> advisory message destination created. If no other consumer/producer connects 
> on the same destination until the broker is shutting down, then no advisory 
> destination pairing will be created for that consumer/producer. When the 
> broker starts the shutdown process, the event that will trigger an advisory 
> message will be the removal of a consumer/producer as part of the 
> TransportConnection stopping process. The sequence of calls with line numbers 
> can be seen within the STACK_TRACE.
> Back to the above specified code block, when the broker is shutting down, the 
> broker is still in the started state. Then, the boolean expression will 
> evaluate to true and hence proceed with the advisory message creation/sending 
> and destination creation/addition process. However, in the meantime, as part 
> of the shutdown process, the Scheduler timer is being cancelled. Consider the 
> following blocks of code:
> activemq-broker/src/main/java/org/apache/activemq/broker/BrokerService.java
> Lines 756 - 761 [in stop()]
> // for each service
>      stopper.stop(service);
> activemq-client/src/main/java/org/apache/activemq/util/ServiceSupport.java
> Lines 67 - 84 [in stop()]
> if (stopped.compareAndSet(false, true)) {
>     ... some code ...
>     doStop(stopper);
>     ... some code ...    
> }
> activemq-client/src/main/java/org/apache/activemq/thread/Scheduler.java
> Lines 69 - 71 [in doStop(...)]
> if (this.timer != null) {
>      this.timer.cancel();
> }
> At this point, the timer is cancelled. Now, back to the thread removing the 
> consumer. From the STACK_TRACE, it can be seen that after many calls related 
> to the advisory message destination creation, it ends up in AbstractRegion’s 
> addDestination method.
> activemq-broker/src/main/java/org/apache/activemq/broker/region/AbstractRegion.java
> Lines 132 - 145 [in addDestination(...)]
> if (dest == null) {
>      ... some code ...
>      dest = createDestination(context, destination);
>      ... some code ...
>      dest.start();  
>      ... some code ...
> }
> As it can be seen in the STACK_TRACE below, dest.start() call will trigger 
> the need for a timer to be scheduled on the same timer that has already been 
> cancelled. Some debug prints to confirm the race condition:
> 1. Fail to send advisory message for this consumer advisory topic because 
> getBrokerService().isStarted() is false:
> 2015-01-06 08:49:26,215 | INFO  | patanasov: from AdvisoryBroker fireAdvisory 
> would have sent advisoryMessage: topic 
> ActiveMQ.Advisory.Consumer.Queue.2015.01.06-08.23.00.16401 | 
> org.apache.activemq.advisory.AdvisoryBroker
> 2. Broker shutting down and cancelling timer:
> 2015-01-06 08:49:43,602 | INFO  | Apache ActiveMQ 5.10.0 (broker0, 
> ID:host-40197-1420555763149-0:1) is shutting down | 
> org.apache.activemq.broker.BrokerService | ActiveMQ ShutdownHook
> 2015-01-06 08:49:43,603 | INFO  | patanasov: from ServiceSupport calling 
> doStop(stopper) | org.apache.activemq.util.ServiceSupport | ActiveMQ 
> ShutdownHook
> 2015-01-06 08:49:43,603 | INFO  | patanasov: from Scheduler 
> doStop(ServiceStopper stopper) calling this.timer.cancel(): timer 
> java.util.Timer@7df33bb0 | org.apache.activemq.thread.Scheduler | ActiveMQ 
> ShutdownHook
> 3. Adding destination and scheduling on cancelled timer (7df33bb0):
> 2015-01-06 08:49:43,767 | INFO  | patanasov: broker0 adding destination: 
> qualified: 
> topic://ActiveMQ.Advisory.Consumer.Queue.2015.01.06-08.23.00.16401, physical: 
> ActiveMQ.Advisory.Consumer.Queue.2015.01.06-08.23.00.16401 | 
> org.apache.activemq.broker.region.AbstractRegion | ActiveMQ 
> BrokerService[broker0] Task-12
> 2015-01-06 08:49:43,782 | INFO  | patanasov: from schedualPeriodically: 
> timer: java.util.Timer@7df33bb0 | org.apache.activemq.thread.Scheduler | 
> ActiveMQ BrokerService[broker0] Task-12
> 2015-01-06 08:49:43,785 | INFO  | patanasov: caught an exception in 
> schedualPeriodically: | org.apache.activemq.thread.Scheduler | ActiveMQ 
> BrokerService[broker0] Task-12
> java.lang.IllegalStateException: Timer already cancelled.
> I will provide a patch to fix this issue. However, the bigger concern here is 
> that advisory message destination creations fail for clients that connect 
> while getBrokerService().isStarted() is false. If there is only one 
> consumer/producer on one destination and if it connects while 
> getBrokerService().isStarted() is false, then it will fail to create/send 
> advisory message for that destination, at least until the producer/consumer 
> is removed at the end. I will continue looking to learn more about this state 
> of the broker and open a separate JIRA issue if needed.
> STACK_TRACE:
> java.lang.IllegalStateException: Timer already cancelled.
>     at java.util.Timer.sched(Timer.java:354)
>     at java.util.Timer.schedule(Timer.java:222)
>     at
> org.apache.activemq.thread.Scheduler.schedualPeriodically(Scheduler.java:53)
>     at org.apache.activemq.broker.region.Topic.start(Topic.java:563)
>     at
> org.apache.activemq.broker.region.AbstractRegion.addDestination(AbstractRegion.java:141)
>     at
> org.apache.activemq.broker.region.RegionBroker.addDestination(RegionBroker.java:325)
>     at
> org.apache.activemq.broker.BrokerFilter.addDestination(BrokerFilter.java:167)
>     at
> org.apache.activemq.advisory.AdvisoryBroker.addDestination(AdvisoryBroker.java:184)
>     at
> org.apache.activemq.broker.BrokerFilter.addDestination(BrokerFilter.java:167)
>     at
> org.apache.activemq.broker.BrokerFilter.addDestination(BrokerFilter.java:167)
>     at
> org.apache.activemq.broker.MutableBrokerFilter.addDestination(MutableBrokerFilter.java:172)
>     at
> org.apache.activemq.broker.region.RegionBroker.send(RegionBroker.java:439)
>     at
> org.apache.activemq.broker.jmx.ManagedRegionBroker.send(ManagedRegionBroker.java:297)
>     at
> org.apache.activemq.advisory.AdvisoryBroker.fireAdvisory(AdvisoryBroker.java:615)
>     at
> org.apache.activemq.advisory.AdvisoryBroker.fireConsumerAdvisory(AdvisoryBroker.java:564)
>     at
> org.apache.activemq.advisory.AdvisoryBroker.fireConsumerAdvisory(AdvisoryBroker.java:550)
>     at
> org.apache.activemq.advisory.AdvisoryBroker.removeConsumer(AdvisoryBroker.java:271)
>     at
> org.apache.activemq.broker.BrokerFilter.removeConsumer(BrokerFilter.java:132)
>     at
> org.apache.activemq.broker.BrokerFilter.removeConsumer(BrokerFilter.java:132)
>     at
> org.apache.activemq.broker.MutableBrokerFilter.removeConsumer(MutableBrokerFilter.java:137)
>     at
> org.apache.activemq.broker.TransportConnection.processRemoveConsumer(TransportConnection.java:651)
>     at
> org.apache.activemq.broker.TransportConnection.processRemoveSession(TransportConnection.java:690)
>     at
> org.apache.activemq.broker.TransportConnection.processRemoveConnection(TransportConnection.java:802)
>     at
> org.apache.activemq.broker.TransportConnection.doStop(TransportConnection.java:1143)
>     at
> org.apache.activemq.broker.TransportConnection$4.run(TransportConnection.java:1073)
>     at
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:885)
>     at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907)
>     at java.lang.Thread.run(Thread.java:619) 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to