[
https://issues.apache.org/jira/browse/ARTEMIS-2147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Justin Bertram updated ARTEMIS-2147:
------------------------------------
Description:
There appears to be a race condition when using dynamically created queues with
replication based fail over and fail back and using the CORE jms client. When
a fail over and/or fail back occurs the server will log an exception:
{{ERROR [org.apache.activemq.artemis.core.server] AMQ224016: Caught exception:
ActiveMQNonExistentQueueException[errorType=QUEUE_DOES_NOT_EXIST
message=AMQ119017: Queue test.queue does not exist]}}
The client never sees an exception (after the initial connection failure) and
appears to believe that the re-connection was a success. However, the client
will no longer receive messages that are sent to the queue. If you debug
through the code upon a fail over at the part where the consumer is being
created you will not see the problem occur unless you set the break point after
the address lookup at which point it will occasionally fail. Hence the belief
that this is a race condition.
Steps to reproduce:
# Create master server with replication, check-for-live-server=true
# Create backup server with replication, allow-failback=true,
failback-delay=5000
# Start master server
# Start backup server
# Create a consumer on a dynamically defined, named queue (e.g. test.queue)
using the artemis core jms client
# Create a producer from another connection on the same queue and start
sending periodic messages
# Stop the master server
** Failover to the backup will take place. The client will log the connection
failure
** The error may occur at this point where the backup server will log the
aforementioned exception - If the error does occur, the consumer will stop
receiving new messages
# Start the master server
** Fail back to the master server will take place once it has started
** The client will log the connection failure once the master takes over
** The error may occur at this point where the master server will log the
aforementioned exception - If the error does occur, the consumer will stop
receiving new messages
# If the {{ActiveMQNonExistentQueueException}} does not occur, repeat steps 7
and 8.
The exception most often occurs during the fail back to the master server and
often within only 1 or 2 fail back attempts. This has been seen on 2.4.0,
2.5.0, 2.6.3, and 2.7.0-SNAPSHOT
was:
There appears to be a race condition when using dynamically created queues with
replication based fail over and fail back and using the CORE jms client. When
a fail over and/or fail back occurs the server will log an exception:
`ERROR [org.apache.activemq.artemis.core.server] AMQ224016: Caught exception:
ActiveMQNonExistentQueueException[errorType=QUEUE_DOES_NOT_EXIST
message=AMQ119017: Queue test.queue does not exist]`
The client never sees an exception (after the initial connection failure) and
appears to believe that the re-connection was a success. However, the client
will no longer receive messages that are sent to the queue. If you debug
through the code upon a fail over at the part where the consumer is being
created you will not see the problem occur unless you set the break point after
the address lookup at which point it will occasionally fail. Hence the belief
that this is a race condition.
Steps to reproduce:
1. Create master server with replication, check-for-live-server=true
2. Create backup server with replication, allow-failback=true,
failback-delay=5000
3. Start master server
4. Start backup server
5. Create a consumer on a dynamically defined, named queue (e.g. test.queue)
using the artemis core jms client
6. Create a producer from another connection on the same queue and start
sending periodic messages
7. Stop the master server
- Failover to the backup will take place. The client will log the connection
failure
- The error may occur at this point where the backup server will log the
aforementioned exception - If the error does occur, the consumer will stop
receiving new messages
8. Start the master server
- Fail back to the master server will take place once it has started
- The client will log the connection failure once the master takes over
- The error may occur at this point where the master server will log the
aforementioned exception - If the error does occur, the consumer will stop
receiving new messages
9. If the ActiveMQNonExistentQueueException does not occur, repeat steps 7 and
8.
The exception most often occurs during the fail back to the master server and
often within only 1 or 2 fail back attempts. This has been seen on 2.4.0,
2.5.0, 2.6.3, and 2.7.0-SNAPSHOT
> Fail over and Fail back race condition with dynamic queues
> ----------------------------------------------------------
>
> Key: ARTEMIS-2147
> URL: https://issues.apache.org/jira/browse/ARTEMIS-2147
> Project: ActiveMQ Artemis
> Issue Type: Bug
> Components: Broker
> Affects Versions: 2.4.0, 2.5.0, 2.6.3
> Reporter: Derek Wilhelm
> Priority: Major
>
> There appears to be a race condition when using dynamically created queues
> with replication based fail over and fail back and using the CORE jms client.
> When a fail over and/or fail back occurs the server will log an exception:
> {{ERROR [org.apache.activemq.artemis.core.server] AMQ224016: Caught
> exception: ActiveMQNonExistentQueueException[errorType=QUEUE_DOES_NOT_EXIST
> message=AMQ119017: Queue test.queue does not exist]}}
> The client never sees an exception (after the initial connection failure) and
> appears to believe that the re-connection was a success. However, the client
> will no longer receive messages that are sent to the queue. If you debug
> through the code upon a fail over at the part where the consumer is being
> created you will not see the problem occur unless you set the break point
> after the address lookup at which point it will occasionally fail. Hence the
> belief that this is a race condition.
> Steps to reproduce:
> # Create master server with replication, check-for-live-server=true
> # Create backup server with replication, allow-failback=true,
> failback-delay=5000
> # Start master server
> # Start backup server
> # Create a consumer on a dynamically defined, named queue (e.g. test.queue)
> using the artemis core jms client
> # Create a producer from another connection on the same queue and start
> sending periodic messages
> # Stop the master server
> ** Failover to the backup will take place. The client will log the
> connection failure
> ** The error may occur at this point where the backup server will log the
> aforementioned exception - If the error does occur, the consumer will stop
> receiving new messages
> # Start the master server
> ** Fail back to the master server will take place once it has started
> ** The client will log the connection failure once the master takes over
> ** The error may occur at this point where the master server will log the
> aforementioned exception - If the error does occur, the consumer will stop
> receiving new messages
> # If the {{ActiveMQNonExistentQueueException}} does not occur, repeat steps
> 7 and 8.
> The exception most often occurs during the fail back to the master server and
> often within only 1 or 2 fail back attempts. This has been seen on 2.4.0,
> 2.5.0, 2.6.3, and 2.7.0-SNAPSHOT
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)