[jira] [Commented] (ARTEMIS-2147) Fail over and Fail back race condition with dynamic queues

Justin Bertram (JIRA) Wed, 24 Oct 2018 13:17:38 -0700


    [ 
https://issues.apache.org/jira/browse/ARTEMIS-2147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16662795#comment-16662795
 ]


Justin Bertram commented on ARTEMIS-2147:
-----------------------------------------

This looks similar to (but not exactly the same as) ARTEMIS-1818.

> Fail over and Fail back race condition with dynamic queues
> ----------------------------------------------------------
>
>                 Key: ARTEMIS-2147
>                 URL: https://issues.apache.org/jira/browse/ARTEMIS-2147
>             Project: ActiveMQ Artemis
>          Issue Type: Bug
>          Components: Broker
>    Affects Versions: 2.4.0, 2.5.0, 2.6.3
>            Reporter: Derek Wilhelm
>            Priority: Major
>
> There appears to be a race condition when using dynamically created queues 
> with replication based fail over and fail back and using the CORE jms client. 
>  When a fail over and/or fail back occurs the server will log an exception:
> `ERROR [org.apache.activemq.artemis.core.server] AMQ224016: Caught exception: 
> ActiveMQNonExistentQueueException[errorType=QUEUE_DOES_NOT_EXIST 
> message=AMQ119017: Queue test.queue does not exist]`
> The client never sees an exception (after the initial connection failure) and 
> appears to believe that the re-connection was a success.  However, the client 
> will no longer receive messages that are sent to the queue.  If you debug 
> through the code upon a fail over at the part where the consumer is being 
> created you will not see the problem occur unless you set the break point 
> after the address lookup at which point it will occasionally fail.  Hence the 
> belief that this is a race condition.
>  
> Steps to reproduce:
> 1. Create master server with replication, check-for-live-server=true
> 2. Create backup server with replication, allow-failback=true, 
> failback-delay=5000
> 3. Start master server
> 4. Start backup server
> 5. Create a consumer on a dynamically defined, named queue (e.g. test.queue) 
> using the artemis core jms client
> 6. Create a producer from another connection on the same queue and start 
> sending periodic messages
> 7. Stop the master server
>  - Failover to the backup will take place.  The client will log the 
> connection failure
>  - The error may occur at this point where the backup server will log the 
> aforementioned exception - If the error does occur, the consumer will stop 
> receiving new messages
> 8. Start the master server
>  - Fail back to the master server will take place once it has started
>  - The client will log the connection failure once the master takes over
>  - The error may occur at this point where the master server will log the 
> aforementioned exception - If the error does occur, the consumer will stop 
> receiving new messages
> 9. If the ActiveMQNonExistentQueueException does not occur, repeat steps 7 
> and 8.
>  
> The exception most often occurs during the fail back to the master server and 
> often within only 1 or 2 fail back attempts.  This has been seen on 2.4.0, 
> 2.5.0, 2.6.3, and 2.7.0-SNAPSHOT



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (ARTEMIS-2147) Fail over and Fail back race condition with dynamic queues

Reply via email to