[jira] [Commented] (ARTEMIS-2147) Fail over and Fail back race condition with dynamic queues

Derek Wilhelm (JIRA) Wed, 24 Oct 2018 13:24:54 -0700


    [ 
https://issues.apache.org/jira/browse/ARTEMIS-2147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16662799#comment-16662799
 ]


Derek Wilhelm commented on ARTEMIS-2147:
----------------------------------------

Here is the stack trace (from running off of the master branch) for when this 
occurs:

`2018-10-24 13:00:27,490 ERROR [org.apache.activemq.artemis.core.server] 
AMQ224016: Caught exception: 
ActiveMQNonExistentQueueException[errorType=QUEUE_DOES_NOT_EXIST 
message=AMQ119017: Queue test.queue does not exist]
 at 
org.apache.activemq.artemis.core.server.impl.ServerSessionImpl.createConsumer(ServerSessionImpl.java:453)
 [artemis-server-2.7.0-SNAPSHOT.jar:2.7.0-SNAPSHOT]
 at 
org.apache.activemq.artemis.core.server.impl.ServerSessionImpl.createConsumer(ServerSessionImpl.java:438)
 [artemis-server-2.7.0-SNAPSHOT.jar:2.7.0-SNAPSHOT]
 at 
org.apache.activemq.artemis.core.protocol.core.ServerSessionPacketHandler.slowPacketHandler(ServerSessionPacketHandler.java:326)
 [artemis-server-2.7.0-SNAPSHOT.jar:2.7.0-SNAPSHOT]
 at 
org.apache.activemq.artemis.core.protocol.core.ServerSessionPacketHandler.onMessagePacket(ServerSessionPacketHandler.java:290)
 [artemis-server-2.7.0-SNAPSHOT.jar:2.7.0-SNAPSHOT]
 at org.apache.activemq.artemis.utils.actors.Actor.doTask(Actor.java:33) 
[artemis-commons-2.7.0-SNAPSHOT.jar:2.7.0-SNAPSHOT]
 at 
org.apache.activemq.artemis.utils.actors.ProcessorBase.executePendingTasks(ProcessorBase.java:66)
 [artemis-commons-2.7.0-SNAPSHOT.jar:2.7.0-SNAPSHOT]
 at 
org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:42)
 [artemis-commons-2.7.0-SNAPSHOT.jar:2.7.0-SNAPSHOT]
 at 
org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:31)
 [artemis-commons-2.7.0-SNAPSHOT.jar:2.7.0-SNAPSHOT]
 at 
org.apache.activemq.artemis.utils.actors.ProcessorBase.executePendingTasks(ProcessorBase.java:66)
 [artemis-commons-2.7.0-SNAPSHOT.jar:2.7.0-SNAPSHOT]
 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
[rt.jar:1.8.0_171]
 at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
[rt.jar:1.8.0_171]
 at 
org.apache.activemq.artemis.utils.ActiveMQThreadFactory$1.run(ActiveMQThreadFactory.java:118)
 [artemis-commons-2.7.0-SNAPSHOT.jar:2.7.0-SNAPSHOT]`

> Fail over and Fail back race condition with dynamic queues
> ----------------------------------------------------------
>
>                 Key: ARTEMIS-2147
>                 URL: https://issues.apache.org/jira/browse/ARTEMIS-2147
>             Project: ActiveMQ Artemis
>          Issue Type: Bug
>          Components: Broker
>    Affects Versions: 2.4.0, 2.5.0, 2.6.3
>            Reporter: Derek Wilhelm
>            Priority: Major
>
> There appears to be a race condition when using dynamically created queues 
> with replication based fail over and fail back and using the CORE jms client. 
>  When a fail over and/or fail back occurs the server will log an exception:
> {{ERROR [org.apache.activemq.artemis.core.server] AMQ224016: Caught 
> exception: ActiveMQNonExistentQueueException[errorType=QUEUE_DOES_NOT_EXIST 
> message=AMQ119017: Queue test.queue does not exist]}}
> The client never sees an exception (after the initial connection failure) and 
> appears to believe that the re-connection was a success.  However, the client 
> will no longer receive messages that are sent to the queue.  If you debug 
> through the code upon a fail over at the part where the consumer is being 
> created you will not see the problem occur unless you set the break point 
> after the address lookup at which point it will occasionally fail.  Hence the 
> belief that this is a race condition.
> Steps to reproduce:
>  # Create master server with replication, check-for-live-server=true
>  # Create backup server with replication, allow-failback=true, 
> failback-delay=5000
>  # Start master server
>  # Start backup server
>  # Create a consumer on a dynamically defined, named queue (e.g. test.queue) 
> using the artemis core jms client
>  # Create a producer from another connection on the same queue and start 
> sending periodic messages
>  # Stop the master server
>  ** Failover to the backup will take place.  The client will log the 
> connection failure
>  ** The error may occur at this point where the backup server will log the 
> aforementioned exception - If the error does occur, the consumer will stop 
> receiving new messages
>  # Start the master server
>  ** Fail back to the master server will take place once it has started
>  ** The client will log the connection failure once the master takes over
>  ** The error may occur at this point where the master server will log the 
> aforementioned exception - If the error does occur, the consumer will stop 
> receiving new messages
>  # If the {{ActiveMQNonExistentQueueException}} does not occur, repeat steps 
> 7 and 8.
> The exception most often occurs during the fail back to the master server and 
> often within only 1 or 2 fail back attempts.  This has been seen on 2.4.0, 
> 2.5.0, 2.6.3, and 2.7.0-SNAPSHOT



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (ARTEMIS-2147) Fail over and Fail back race condition with dynamic queues

Reply via email to