[jira] [Updated] (ARTEMIS-2147) Fail over and Fail back race condition with dynamic queues

Justin Bertram (JIRA) Wed, 24 Oct 2018 13:20:30 -0700


     [ 
https://issues.apache.org/jira/browse/ARTEMIS-2147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Justin Bertram updated ARTEMIS-2147:
------------------------------------
    Description: 
There appears to be a race condition when using dynamically created queues with 
replication based fail over and fail back and using the CORE jms client.  When 
a fail over and/or fail back occurs the server will log an exception:

{{ERROR [org.apache.activemq.artemis.core.server] AMQ224016: Caught exception: 
ActiveMQNonExistentQueueException[errorType=QUEUE_DOES_NOT_EXIST 
message=AMQ119017: Queue test.queue does not exist]}}

The client never sees an exception (after the initial connection failure) and 
appears to believe that the re-connection was a success.  However, the client 
will no longer receive messages that are sent to the queue.  If you debug 
through the code upon a fail over at the part where the consumer is being 
created you will not see the problem occur unless you set the break point after 
the address lookup at which point it will occasionally fail.  Hence the belief 
that this is a race condition.

Steps to reproduce:
 # Create master server with replication, check-for-live-server=true
 # Create backup server with replication, allow-failback=true, 
failback-delay=5000
 # Start master server
 # Start backup server
 # Create a consumer on a dynamically defined, named queue (e.g. test.queue) 
using the artemis core jms client
 # Create a producer from another connection on the same queue and start 
sending periodic messages
 # Stop the master server
 ** Failover to the backup will take place.  The client will log the connection 
failure
 ** The error may occur at this point where the backup server will log the 
aforementioned exception - If the error does occur, the consumer will stop 
receiving new messages
 # Start the master server
 ** Fail back to the master server will take place once it has started
 ** The client will log the connection failure once the master takes over
 ** The error may occur at this point where the master server will log the 
aforementioned exception - If the error does occur, the consumer will stop 
receiving new messages
 # If the {{ActiveMQNonExistentQueueException}} does not occur, repeat steps 7 
and 8.

The exception most often occurs during the fail back to the master server and 
often within only 1 or 2 fail back attempts.  This has been seen on 2.4.0, 
2.5.0, 2.6.3, and 2.7.0-SNAPSHOT

  was:
There appears to be a race condition when using dynamically created queues with 
replication based fail over and fail back and using the CORE jms client.  When 
a fail over and/or fail back occurs the server will log an exception:

`ERROR [org.apache.activemq.artemis.core.server] AMQ224016: Caught exception: 
ActiveMQNonExistentQueueException[errorType=QUEUE_DOES_NOT_EXIST 
message=AMQ119017: Queue test.queue does not exist]`

The client never sees an exception (after the initial connection failure) and 
appears to believe that the re-connection was a success.  However, the client 
will no longer receive messages that are sent to the queue.  If you debug 
through the code upon a fail over at the part where the consumer is being 
created you will not see the problem occur unless you set the break point after 
the address lookup at which point it will occasionally fail.  Hence the belief 
that this is a race condition.

 

Steps to reproduce:

1. Create master server with replication, check-for-live-server=true

2. Create backup server with replication, allow-failback=true, 
failback-delay=5000

3. Start master server

4. Start backup server

5. Create a consumer on a dynamically defined, named queue (e.g. test.queue) 
using the artemis core jms client

6. Create a producer from another connection on the same queue and start 
sending periodic messages

7. Stop the master server

 - Failover to the backup will take place.  The client will log the connection 
failure

 - The error may occur at this point where the backup server will log the 
aforementioned exception - If the error does occur, the consumer will stop 
receiving new messages

8. Start the master server

 - Fail back to the master server will take place once it has started

 - The client will log the connection failure once the master takes over

 - The error may occur at this point where the master server will log the 
aforementioned exception - If the error does occur, the consumer will stop 
receiving new messages

9. If the ActiveMQNonExistentQueueException does not occur, repeat steps 7 and 
8.

 

The exception most often occurs during the fail back to the master server and 
often within only 1 or 2 fail back attempts.  This has been seen on 2.4.0, 
2.5.0, 2.6.3, and 2.7.0-SNAPSHOT


> Fail over and Fail back race condition with dynamic queues
> ----------------------------------------------------------
>
>                 Key: ARTEMIS-2147
>                 URL: https://issues.apache.org/jira/browse/ARTEMIS-2147
>             Project: ActiveMQ Artemis
>          Issue Type: Bug
>          Components: Broker
>    Affects Versions: 2.4.0, 2.5.0, 2.6.3
>            Reporter: Derek Wilhelm
>            Priority: Major
>
> There appears to be a race condition when using dynamically created queues 
> with replication based fail over and fail back and using the CORE jms client. 
>  When a fail over and/or fail back occurs the server will log an exception:
> {{ERROR [org.apache.activemq.artemis.core.server] AMQ224016: Caught 
> exception: ActiveMQNonExistentQueueException[errorType=QUEUE_DOES_NOT_EXIST 
> message=AMQ119017: Queue test.queue does not exist]}}
> The client never sees an exception (after the initial connection failure) and 
> appears to believe that the re-connection was a success.  However, the client 
> will no longer receive messages that are sent to the queue.  If you debug 
> through the code upon a fail over at the part where the consumer is being 
> created you will not see the problem occur unless you set the break point 
> after the address lookup at which point it will occasionally fail.  Hence the 
> belief that this is a race condition.
> Steps to reproduce:
>  # Create master server with replication, check-for-live-server=true
>  # Create backup server with replication, allow-failback=true, 
> failback-delay=5000
>  # Start master server
>  # Start backup server
>  # Create a consumer on a dynamically defined, named queue (e.g. test.queue) 
> using the artemis core jms client
>  # Create a producer from another connection on the same queue and start 
> sending periodic messages
>  # Stop the master server
>  ** Failover to the backup will take place.  The client will log the 
> connection failure
>  ** The error may occur at this point where the backup server will log the 
> aforementioned exception - If the error does occur, the consumer will stop 
> receiving new messages
>  # Start the master server
>  ** Fail back to the master server will take place once it has started
>  ** The client will log the connection failure once the master takes over
>  ** The error may occur at this point where the master server will log the 
> aforementioned exception - If the error does occur, the consumer will stop 
> receiving new messages
>  # If the {{ActiveMQNonExistentQueueException}} does not occur, repeat steps 
> 7 and 8.
> The exception most often occurs during the fail back to the master server and 
> often within only 1 or 2 fail back attempts.  This has been seen on 2.4.0, 
> 2.5.0, 2.6.3, and 2.7.0-SNAPSHOT



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (ARTEMIS-2147) Fail over and Fail back race condition with dynamic queues

Reply via email to