Torsten Mielke created AMQ-4465:
-----------------------------------

             Summary: Rethink replayWhenNoConsumers solution
                 Key: AMQ-4465
                 URL: https://issues.apache.org/jira/browse/AMQ-4465
             Project: ActiveMQ
          Issue Type: Improvement
          Components: Broker
    Affects Versions: 5.8.0
            Reporter: Torsten Mielke


I would like to start a discussion about the way we allow messages to be 
replayed back to the original broker in a broker network, i.e. setting 
replayWhenNoConsumers=true.

This discussion is based on the blog post 
http://tmielke.blogspot.de/2012/03/i-have-messages-on-queue-but-they-dont.html
but I will outline the full story here again. 


Consider a network of two brokers A and B. 
Broker A has a producer that sends one msg to queue Test.in. Broker B has a 
consumer connected so the msg is transferred to broker B. Lets assume the 
consumer disconnects from B *before* it consumes the msg and reconnects to 
broker A. If broker B has replayWhenNoConsumers=true, the message will be 
replayed back to broker A. 
If that replay happens in a short time frame, the cursor will mark the replayed 
msgs as a duplicate and won't dispatch it. To overcome this, one needs to set 
enableAudit=false on the policyEntry for the destination. 

This has a consequence as it disables duplicate detection in the cursor. 
External JMS producers will still be blocked from sending duplicates thanks to 
the duplicate detection built into the persistence adapter. 
However you can still get duplicate messages over the network bridge now. With 
enableAudit=false these duplicates will be happily added to the cursor now. If 
the same consumer receives the duplicate message, it will likely detect the 
duplicate. However if the duplicate message is dispatched to a different 
consumer, it won't be detected but will be processed by the application.

For many use cases its important not to receive duplicate messages so the above 
setup replayWhenNoConsumers=true and enableAudit=false becomes a problem.

There is the additional option of specifying auditNetworkProducers="true" on 
the transport connector but that's very likely going to have consequences as 
well. With auditNetworkProducers="true" we will now detect duplicates over the 
network bridge, so if there is a network glitch while the message is replayed 
back on the bridge to broker A and broker B tries to resend the message again, 
it will be detected as a duplicate on broker A. This is good.

However lets assume the consumer now disconnects from broker A *after* the 
message was replayed back from broker B to broker A but *before* the consumer 
actually received the message. The consumer then reconnects to broker B again. 
The replayed message is on broker A now. Broker B registers a new demand for 
this message (due to the consumer reconnecting) and broker A will pass on the 
message to broker B again. However due to auditNetworkProducers="true" broker B 
will treat the resent message as a duplicate and very likely not accept it (or 
even worse simply drop the message - not sure how exactly it will behave). 

So the message is stuck again and won't be dispatched to the consumer on broker 
B. 
The networkTTL setting will further have an effect on this scenario and so will 
have other broker topologies like a full mesh.

It seems to me that 
- When allowing replayWhenNoConsumers=true you may receive duplicate messages 
unless you also set auditNetworkProducers="true" which has consequences as well.
- If consumers are reconnecting to a different broker each time that you may 
end up with msgs stuck on a broker that won't get dispatched. 
- Ideally you want sticky consumers, i.e. they reconnect to the same broker if 
possible in order to avoid replaying back messages. This implies that you don't 
want to use randomize=true on failover urls. I don't think we recommend this in 
any docs.
- The network ttl will potentially never be high enough and the message may be 
stuck on a particular broker as the consumer may have reconnected to another 
broker in the network.

I am sure there are more sides to this discussion. I just wanted to capture 
what gtully and I found when discussing this problem. 



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to