GitHub user lhotari edited a comment on the discussion: Questions regarding pulsar active-active geo-replication
Thank you @Apurva007, good questions. > In this cases, how are the offsets managed across clusters? The messages in different clusters don't share the same message ids. The message ids of the originating cluster are independent of the message ids in the remote cluster. There are 2 parts to what you could call "offset management" across clusters. For replication itself, messages originating from one cluster to a remote cluster are handled by a replicator instance for each topic in the originating cluster which will publish (push) messages to the remote cluster and keep the state in a special subscription about this. To prevent replication loops, the message that is published in the remote cluster will contain metadata about the originating cluster and the original message id. The [replicator](https://github.com/apache/pulsar/blob/master/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/persistent/PersistentReplicator.java) is special in this sense that it's like a consumer but it's directly implemented in the Pulsar broker on top of the "managed ledger" layer without a consumer. This can be seen in the [Pulsar architecture diagram](https://pulsar.apache.org/docs/3.2.x/concepts-architecture-overview/) as "global replicators". For [replicated subscriptions](https://pulsar.apache.org/docs/3.2.x/administration-geo/#replicated-subscriptions), reading "[PIP 33: Replicated subscriptions](https://github.com/apache/pulsar/wiki/PIP-33%3A-Replicated-subscriptions)" and especially the "[Construction a cursor snapshot](https://github.com/apache/pulsar/wiki/PIP-33%3A-Replicated-subscriptions#constructing-a-cursor-snapshot)" is helpful in understanding how "offset management" works under the covers and what the limitations are. There's also a blog post that contains a [useful summary of the limitations](https://streamnative.io/blog/migrating-tenants-across-clusters-with-pulsars-geo-replication#problems-in-consumption-progress-synchronization). The subscription snapshotting seems to be an application of [Vector clocks](https://en.wikipedia.org/wiki/Vector_clock) although this isn't explicitly mentioned in the PIP-33 design document. There's another discussion https://github.com/apache/pulsar/discussions/21612 which cont ains useful observations and details about replicated subscriptions. > In the same pattern, if the subscription state is replicated, and the > consumers of S1 subscription are connecting to both clusters, is there an > internal protection in the replicators to make sure that replicated state > does not override the current offsets in the cluster due to active consumers > already using these offsets? Shared subscriptions using the same replicated subscription across geo-replication clusters don't have consistent behavior. It "works", but the same offsets would get consumed in both clusters in non deterministic ways. I haven't validated this what I'm saying, but I have the understanding that in many cases the messages would get processed by the concurrent consumers sharing the same replicated subscription name in both clusters, but not at all times. For use cases where there's a requirement to have at-least-once processing in any of the clusters with the replicated cluster, this is fine when a lot of duplicates aren't a problem. My understanding is that replicated subscriptions are designed to be used for active-passive configurations where some overlap isn't a problem and where there's an external solution for handling the solution for choosing which consumer should be active for a particular replicated subscription. It seems that [the documentation supports this](https://pulsar. apache.org/docs/3.2.x/administration-geo/#replicated-subscriptions): > In case of failover, a consumer can restart consuming from the failure point > in a different cluster. GitHub link: https://github.com/apache/pulsar/discussions/22315#discussioncomment-8875503 ---- This is an automatically sent email for [email protected]. To unsubscribe, please send an email to: [email protected]
