GitHub user Apurva007 added a comment to the discussion: Questions regarding 
pulsar  active-active geo-replication

@lhotari Thanks for the great explanation. That helps clear most of my 
questions. 
A follow-up question to the "offset management" was "How is this pattern not 
causing 100% data duplication in consumption due to same data being available 
on both clusters?"

Please can you help explain how this diagram works:
<img width="1000" alt="image" 
src="https://github.com/apache/pulsar/assets/10327630/a51b9c8b-1786-4d95-83a7-56a2d2ce56cb";>

Eg. A client application in its service url added the URLs of both cluster A 
and cluster B as comma separated values. Geo replication of data is enabled in 
both clusters. Subscription replication is disabled. 

Messages published to Cluster A: M1, M2, M3
Messages published to Cluster B: M4, M5
Data availability on cluster A & B after replication: M1, M4, M2, M3, M5

As per above diagram, lets say subscription S1 having C1 and C2 consumers 
connecting to both cluster A and cluster B in the same instance. What would be 
the expected consumption behavior?

1. S1 receives M1, M4, M2, M3, M5 only once
2. S1 receives M1, M4, M2, M3 and M5 twice. 

If only once, then how is the subscription being tracked across clusters 
without subscription replication?



GitHub link: 
https://github.com/apache/pulsar/discussions/22315#discussioncomment-8881839

----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: [email protected]

Reply via email to