GitHub user lhotari added a comment to the discussion: Questions regarding 
pulsar  active-active geo-replication

I'm sorry I missed the reference to the [active-active replication 
docs](https://pulsar.apache.org/docs/3.2.x/concepts-replication/#active-active-replication)
 in your question. Thanks for the follow up.

> If only once, then how is the subscription being tracked across clusters 
> without subscription replication?

It seems that the example in the documentation is missing that detail. If there 
wouldn't be subscription replication, the subscriptions would be completely 
independent.

> Eg. A client application in its service url added the URLs of both cluster A 
> and cluster B as comma separated values.

This detail makes the scenario active-passive from the application 
(consumers/producers) point of view. The Pulsar client and its consumer would 
connect to only one cluster at a time. This is needed for consistent usage of 
replicated subscriptions. As I mentioned in my previous message, the behaviour 
isn't consistent when the replicated subscription is actively used in more than 
one cluster at a time.

Even with replicated subscriptions, the diagram doesn't make full sense to me 
since there are two separate consumers C1 and C2 in the diagram. When there are 
2 service URLs for the client, it would connect to the first cluster that is 
available and this would be the correct way to use replicated subscriptions.

There are important limitations for replicated subscriptions. For at-least-once 
messaging with a consumer for a replicated subscription consuming only on one 
cluster at a time, this is usually fine when delayed messages aren't used.

The main limitation of replicated subscription is that only the "mark delete" 
position is replicated. Any individually "deleted" (acknowledged) messages will 
be ignored. This is explained in [Penghui's presentation at 
1:12:26](https://www.youtube.com/watch?v=17jQIOVeu4s&t=1h12m26s). Naturally, 
batch index acknowledgements aren't supported either. 

Delayed messages prevent the mark delete position from moving forward until the 
delayed message has been delivered and acknowledged. This is why delayed 
messages together with replicated subscriptions isn't a good solution if the 
large amount of duplicates are a problem when the consumer switches to consume 
from the other cluster.

The current documentation for geo replication needs improvements so that it 
wouldn't cause surprises and unrealistic expectations. Contributions to improve 
the docs are more than welcome to clarify the points that you have brought up 
in your questions.

GitHub link: 
https://github.com/apache/pulsar/discussions/22315#discussioncomment-8882348

----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: [email protected]

Reply via email to