[ 
https://issues.apache.org/jira/browse/AMQ-4465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13627877#comment-13627877
 ] 

Raul Kripalani commented on AMQ-4465:
-------------------------------------

At the risk of sparking up an entire different discussion – the real culprit of 
all this is the store-and-forward technique, in my humble opinion. I think the 
AMQ model could be essentially flawed for highly dynamic, elastic or cloud-like 
scenarios, where consumers and producers can appear anywhere in the messaging 
fabric, and AMQ instances are provisioned and de-provisioned on the fly.

The replayWhenNoConsumers was a solution to bounce messages freely across the 
cluster. But really what we need is multiple ACTIVE brokers to see a single 
view of reality, i.e. a shared knowledge about what messages exist and are 
pending to be delivered, what consumers are alive and where, etc: a messaging 
cloud.

In the era of big data and huge in-memory caches, this seems perfectly doable. 
I'd advocate for a solution such that:

- ACTIVE brokers can connect to a single cache/db, no more exclusivity or 
master locks.
- Reads and writes must be atomic or transactional, but blazing fast in both 
cases.
- All instances see all messages and consumers, but are responsible for only 
local consumers. They decide when to pick a message from the cache and push it 
to a consumer.
- May be embeddable, so that you don't have to start a separate process to use 
AMQ OOTB.
- Can be persistent/non-persistent.

Many NoSQL databases or Java-based distributed cache technologies exist which 
could fulfill these requirements (probably with some adaptations).
                
> Rethink replayWhenNoConsumers solution
> --------------------------------------
>
>                 Key: AMQ-4465
>                 URL: https://issues.apache.org/jira/browse/AMQ-4465
>             Project: ActiveMQ
>          Issue Type: Improvement
>          Components: Broker
>    Affects Versions: 5.8.0
>            Reporter: Torsten Mielke
>
> I would like to start a discussion about the way we allow messages to be 
> replayed back to the original broker in a broker network, i.e. setting 
> replayWhenNoConsumers=true.
> This discussion is based on the blog post 
> http://tmielke.blogspot.de/2012/03/i-have-messages-on-queue-but-they-dont.html
> but I will outline the full story here again. 
> Consider a network of two brokers A and B. 
> Broker A has a producer that sends one msg to queue Test.in. Broker B has a 
> consumer connected so the msg is transferred to broker B. Lets assume the 
> consumer disconnects from B *before* it consumes the msg and reconnects to 
> broker A. If broker B has replayWhenNoConsumers=true, the message will be 
> replayed back to broker A. 
> If that replay happens in a short time frame, the cursor will mark the 
> replayed msgs as a duplicate and won't dispatch it. To overcome this, one 
> needs to set enableAudit=false on the policyEntry for the destination. 
> This has a consequence as it disables duplicate detection in the cursor. 
> External JMS producers will still be blocked from sending duplicates thanks 
> to the duplicate detection built into the persistence adapter. 
> However you can still get duplicate messages over the network bridge now. 
> With enableAudit=false these duplicates will be happily added to the cursor 
> now. If the same consumer receives the duplicate message, it will likely 
> detect the duplicate. However if the duplicate message is dispatched to a 
> different consumer, it won't be detected but will be processed by the 
> application.
> For many use cases its important not to receive duplicate messages so the 
> above setup replayWhenNoConsumers=true and enableAudit=false becomes a 
> problem.
> There is the additional option of specifying auditNetworkProducers="true" on 
> the transport connector but that's very likely going to have consequences as 
> well. With auditNetworkProducers="true" we will now detect duplicates over 
> the network bridge, so if there is a network glitch while the message is 
> replayed back on the bridge to broker A and broker B tries to resend the 
> message again, it will be detected as a duplicate on broker A. This is good.
> However lets assume the consumer now disconnects from broker A *after* the 
> message was replayed back from broker B to broker A but *before* the consumer 
> actually received the message. The consumer then reconnects to broker B 
> again. 
> The replayed message is on broker A now. Broker B registers a new demand for 
> this message (due to the consumer reconnecting) and broker A will pass on the 
> message to broker B again. However due to auditNetworkProducers="true" broker 
> B will treat the resent message as a duplicate and very likely not accept it 
> (or even worse simply drop the message - not sure how exactly it will 
> behave). 
> So the message is stuck again and won't be dispatched to the consumer on 
> broker B. 
> The networkTTL setting will further have an effect on this scenario and so 
> will have other broker topologies like a full mesh.
> It seems to me that 
> - When allowing replayWhenNoConsumers=true you may receive duplicate messages 
> unless you also set auditNetworkProducers="true" which has consequences as 
> well.
> - If consumers are reconnecting to a different broker each time that you may 
> end up with msgs stuck on a broker that won't get dispatched. 
> - Ideally you want sticky consumers, i.e. they reconnect to the same broker 
> if possible in order to avoid replaying back messages. This implies that you 
> don't want to use randomize=true on failover urls. I don't think we recommend 
> this in any docs.
> - The network ttl will potentially never be high enough and the message may 
> be stuck on a particular broker as the consumer may have reconnected to 
> another broker in the network.
> I am sure there are more sides to this discussion. I just wanted to capture 
> what gtully and I found when discussing this problem. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to