GitHub user Apurva007 added a comment to the discussion: Questions regarding pulsar active-active geo-replication
@lhotari Thanks for the great explanation. That helps clear most of my questions. A follow-up question to the "offset management" was "How is this pattern not causing 100% data duplication in consumption due to same data being available on both clusters?" Please can you help explain how this diagram works: <img width="1000" alt="image" src="https://github.com/apache/pulsar/assets/10327630/a51b9c8b-1786-4d95-83a7-56a2d2ce56cb"> Eg. A client application in its service url added the URLs of both cluster A and cluster B as comma separated values. Geo replication of data is enabled in both clusters. Subscription replication is disabled. Messages published to Cluster A: M1, M2, M3 Messages published to Cluster B: M4, M5 Data availability on cluster A & B after replication: M1, M4, M2, M3, M5 As per above diagram, lets say subscription S1 having C1 and C2 consumers connecting to both cluster A and cluster B in the same instance. What would be the expected consumption behavior? 1. S1 receives M1, M4, M2, M3, M5 only once 2. S1 receives M1, M4, M2, M3 and M5 twice. If only once, then how is the subscription being tracked across clusters without subscription replication? GitHub link: https://github.com/apache/pulsar/discussions/22315#discussioncomment-8881839 ---- This is an automatically sent email for [email protected]. To unsubscribe, please send an email to: [email protected]
