Andrew May created AMQ-4720:
-------------------------------
Summary: Messages lost after fail-back of a network connector
using priorityBackup=true - reason is that remote broker isn't checking
producerID & is rejecting because of duplicate producerSequence
Key: AMQ-4720
URL: https://issues.apache.org/jira/browse/AMQ-4720
Project: ActiveMQ
Issue Type: Bug
Components: Broker
Affects Versions: 5.8.0
Environment: Only tested on Windows 7 Enterprise x64.
Reporter: Andrew May
Summary of problem:
-------------------
If a static failover network connector is setup to connect to 2 other brokers &
to fail-back to a priority broker; messages can be lost after fail-back because
the remote broker deletes them due to duplicate producer-sequence numbers even
though the producer-id has changed.
My suspicion is that the remote broker doesn't recognise that the
re-established connection is a network connection & so doesn't check
producer-id.
Test-harness setup:
-------------------
Using ActiveMQ 5.8.0 binary download.
Only changes are to logging settings & to the configuration file.
3 brokers ("amq1", "amq2", "amq3"), all brokers running on localhost.
Each uses their own config file (amq1.xml, amq2.xml, amq3.xml)
Broker amq1 has a failover duplex connection to amq2.
Broker amq3 has a duplex failover connection to both amq1 + amq2, it is
configured to always try to connect to amq1 first ("randomize=false") and to
fail-back to amq1 if it comes back online ("priorityBackup=true")
Consumer connects to broker amq1
Producer connects to broker amq3
Test-harness sender application creates a new session each time it is run &
sends a set of messages.
The sending session is not transacted & is set to auto-acknowledge.
Messages are sent with persistent delivery mode.
Messages are on queue "MyQueue"
Test script:
------------
Start all 3 brokers.
Broker amq3 establishes a connection to amq1.
Broker amq1 establishes a connection to amq2.
Consumer connects to amq2 & starts consuming queue "MyQueue".
Producer connects to amq3 & sends 10 messages on queue "MyQueue" - these are
all passed on to broker amq1 which forwards them to amq2 where they are
delivered to the consumer.
Producer connects to amq3 & sends 10 messages on queue "MyQueue" - these are
all delivered as before - N.B. producerID is different as this is a new
connection.
Broker amq1 is shut down.
Broker amq3 fails-over to connect to amq2.
Producer connects to amq3 & sends 10 messages on queue "MyQueue" - these are
all passed directly to amq2 where they are delivered to the consumer - (as
before, the producer-id has changed).
Producer connects to amq3 & sends 10 messages on queue "MyQueue" - these are
delivered as before.
Broker amq1 is restarted
Broker amq1 re-establishes its connection to amq2.
Broker amq3 notices that amq1 is available & fails-back to it.
- Broker amq3 closes its connection to amq2
- Broker amq3 starts a new connection to amq1
Producer connects to amq3 & sends 10 messages on queue "MyQueue" - these are
all passed directly to amq2 where they are delivered to the consumer - (as
before, the producer-id has changed).
- N.B. Immediately before the first message is received & forwarded by
amq1, amq1's log shows:
2013-09-11 12:05:56,639 | DEBUG | last stored sequence id set: -1 |
org.apache.activemq.broker.ProducerBrokerExchange | ActiveMQ Transport:
tcp:///172.16.7.85:56880@61616
---> This message only appears after fail-back, it doesn't appear earlier.
This is indicative of the network connection being treated
differently after fail-back.
**********************
** Error occurs now **
**********************
** Producer connects to amq3 & sends 20 messages on queue "MyQueue" (with a
different producer-ID)
- The first 10 are deleted by broker amq2 because it thinks that they have
a duplicate sequence ID.
- amq1 log shows:
2013-09-11 12:06:29,201 | DEBUG | suppressing duplicate message send
[ID:bd7ewandymay-56895-1378897588954-0:1:1:1:1] with producerSequenceId [1]
less than last stored: 10 | org.apache.activemq.broker.ProducerBrokerExchange |
ActiveMQ Transport: tcp:///172.16.7.85:56880@61616
2013-09-11 12:06:29,223 | DEBUG | suppressing duplicate message send
[ID:bd7ewandymay-56895-1378897588954-0:1:1:1:2] with producerSequenceId [2]
less than last stored: 10 | org.apache.activemq.broker.ProducerBrokerExchange |
ActiveMQ Transport: tcp:///172.16.7.85:56880@61616
... snip ...
2013-09-11 12:06:29,396 | DEBUG | suppressing duplicate message send
[ID:bd7ewandymay-56895-1378897588954-0:1:1:1:10] with producerSequenceId [10]
less than last stored: 10 | org.apache.activemq.broker.ProducerBrokerExchange |
ActiveMQ Transport: tcp:///172.16.7.85:56880@61616
- The last 10 are successfully forwarded to amq2, where they are consumed.
** Producer connects to amq3 & sends 30 messages on queue "MyQueue" (with a
different producer-ID)
- The first 20 are deleted by broker amq2 because it thinks that they have
a duplicate sequence ID.
- amq1 log shows:
2013-09-11 12:06:45,668 | DEBUG | suppressing duplicate message send
[ID:bd7ewandymay-56899-1378897605440-0:1:1:1:1] with producerSequenceId [1]
less than last stored: 20 | org.apache.activemq.broker.ProducerBrokerExchange |
ActiveMQ Transport: tcp:///172.16.7.85:56880@61616
2013-09-11 12:06:45,682 | DEBUG | suppressing duplicate message send
[ID:bd7ewandymay-56899-1378897605440-0:1:1:1:2] with producerSequenceId [2]
less than last stored: 20 | org.apache.activemq.broker.ProducerBrokerExchange |
ActiveMQ Transport: tcp:///172.16.7.85:56880@61616
... snip ...
2013-09-11 12:06:45,959 | DEBUG | suppressing duplicate message send
[ID:bd7ewandymay-56899-1378897605440-0:1:1:1:20] with producerSequenceId [20]
less than last stored: 20 | org.apache.activemq.broker.ProducerBrokerExchange |
ActiveMQ Transport: tcp:///172.16.7.85:56880@61616
- The last 10 are successfully forwarded to amq2, where they are consumed.
It looks to me as if amq1 doesn't realise that the fail-back network connection
established by amq3 is a network connection & so isn't checking producer IDs.
Details of why I'm trying this configuration:
---------------------------------------------
Use case:
---------
1 central site.
Multiple branches, each with a single branch server and multiple user PCs.
Each branch only has 1 internet connection that is shared by branch server &
PCs.
Branch server is typically unreliable hardware & may go offline without notice.
Resilience to network loss is important & so each PC & server has its own
broker.
Both branch server & PCs need to be able to communicate with the centre
To reduce the number of connections into the centre, we would like a tree
topology with the branch server concentrating all branch PC messages &
forwarding them to the centre.
But, PCs generate a data feed that we want to be able to access at centre, even
when the branch server is offline.
Proposed configuration:
-----------------------
Use a failover network connection on branch PCs & configure the connection to
prioritise a connection to the branch server, but open a direct connection to
the centre if the branch server is unavailable.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira