poorbarcode commented on code in PR #23124:
URL: https://github.com/apache/pulsar/pull/23124#discussion_r1717855060


##########
pip/pip-370.md:
##########
@@ -0,0 +1,91 @@
+# PIP-370: configurable remote topic creation in geo-replication
+
+# Background knowledge
+
+**The current topic creation behavior when enabling Geo-Replication**
+Users using Geo-Replication backup data across multiple clusters, as well as 
Admin APIs related to Geo-Replication and internal replicators of brokers, will 
trigger topics of auto-creation between clusters.
+- For partitioned topics.
+  - After enabling namespace-level Geo-Replication: the broker will create 
topics on the remote cluster automatically when calling `pulsar-admin topics 
create-partitioned-topic`. It does not depend on enabling 
`allowAutoTopicCreation`.
+  - When enabling topic-level Geo-Replication on a partitioned topic: the 
broker will create topics on the remote cluster automatically. It does not 
depend on enabling `allowAutoTopicCreation`.
+  - When calling `pulsar-admin topics update-partitioned-topic -p 
{partitions}`, the broker will also update partitions on the remote cluster 
automatically.
+- For non-partitioned topics and partitions of partitioned topics.
+  - The internal Geo-Replicator will trigger topics auto-creation for remote 
clusters. **(Highlight)** It depends on enabling `allowAutoTopicCreation`. In 
fact, this behavior is not related to Geo-Replication, it is the behavior of 
the internal producer of Geo-Replicator,   
+
+# Motivation
+
+In the following scenarios, automatic topic creation across clusters is 
problematic due to race conditions during deployments, and there is no choice 
that prevents pulsar resource creation affects each other between clusters.
+
+- Users want to maintain pulsar resources manually.
+- Users pulsar resources using `GitOps CD` automated deployment, for which
+  - Clusters are deployed simultaneously without user intervention.
+  - Each cluster is precisely configured from git repo config variables - 
including the list of all tenants/namespaces/topics to be created in each 
cluster.
+  - Clusters are configured to be exact clones of each other in terms of 
pulsar resources.
+
+**Passed solution**: disable `allowAutoTopicCreation`, the APIs `pulsar-admin 
topics create-partitioned-topic` still create topics on the remote cluster when 
enabled namespace level replication, the API `enable topic-level replication` 
still create topics, And the internal replicator will keep printing error logs 
due to a not found error.
+
+# Goals
+
+- Introduce a flag to disable the replicators to automatically trigger topic 
creation.
+- Move all topic creation/expand-partitions behaviors related to Replication 
to the internal Replicator, pulsar admin API that relates to pulsar topics 
management does not care about replication anymore.
+  - Move the topic creation operations from `pulsar-admin topics 
create-partitioned-topic`, `pulsar-admin topics update-partitioned-topic -p 
{partitions}` and `pulsar-admin topics set-replication-clusters` to the 
component Replicator in the broker internal.
+
+# Detailed Design
+
+## Configuration
+
+**broker.conf**
+```properties
+# Whether the internal replication of the local cluster will trigger topic 
auto-creation on the remote cluster.
+# 1. After enabling namespace-level Geo-Replication: whether the local broker 
will create topics on the remote cluster automatically when calling 
`pulsar-admin topics create-partitioned-topic`.
+# 2. When enabling topic-level Geo-Replication on a partitioned topic: whether 
the local broker will create topics on the remote cluster.
+# 3. Whether the internal Geo-Replicator in the local cluster will trigger 
non-persistent topic auto-creation for remote clusters.
+# It is not a dynamic config, the default value is "true" to preserve 
backward-compatible behavior. 
+createTopicToRemoteClusterForReplication=true
+```
+
+## Design & Implementation Details
+
+### Phase 1: Introduce a flag to disable the replicators to automatically 
trigger topic creation.
+- If `createTopicToRemoteClusterForReplication` is set to `false`.
+  1. After enabling namespace-level Geo-Replication: the broker will not 
create topics on the remote cluster automatically when calling `pulsar-admin 
topics create-partitioned-topic`.
+  2. When enabling topic-level Geo-Replication on a partitioned topic: broker 
will not create topics on the remote cluster automatically.
+  3. The internal Geo-Replicator will not trigger topic auto-creation for 
remote clusters, it just keeps retrying to check if the topic exists on the 
remote cluster, once the topic is created, the replicator starts.
+  4. It does not change the behavior of creating subscriptions after enabling 
`enableReplicatedSubscriptions`, the subscription will also be created on the 
remote cluster after users enable. `enableReplicatedSubscriptions`.
+  5. The config `allowAutoTopicCreation` still works for the local cluster as 
before, it will not be affected by the new config 
`createTopicToRemoteClusterForReplication`.
+- If `createTopicToRemoteClusterForReplication` is set to `true`.
+  a. All components work as before, see details: `Motivation -> The current 
topic creation behavior when enabling Geo-Replication`
+
+### Phase 2: The replicator will check remote topics' partitioned metadata and 
update partitions in the remote cluster to the same as the current cluster if 
needed.
+- If `createTopicToRemoteClusterForReplication` is set to `false`.
+  - The behavior is the same as Phase 1.
+- If `createTopicToRemoteClusterForReplication` is set to `true`.
+  - When a replicator for a topic partition starts, it checks the partitioned 
metadata in the remote cluster first and updates partitions in the remote 
cluster to the same as the current cluster if needed. Seem the example as 
follows:
+    -  `local_cluster.topic.partitions = 2` and the topic does not exist in 
the remote cluster: create a partitioned topic with `2` partitions in the 
remote cluster.
+      - Before `PIP-370 Phase 2`: the replicator will only trigger partition 
creation (`{topic}-partition-0` and `{topic}-partition-1`), and will not care 
about partitioned metadata.
+    -  `local_cluster.topic.partitions = 2` and 
`remote_cluster.topic.partitions = 1`: expand `remote_cluster.topic.partitions` 
to `2`.
+      - Before `PIP-370 Phase 2`: the replicator will only trigger partition 
creation (`{topic}-partition-0` and `{topic}-partition-1`), and the partitioned 
metadata in the remote cluster is still `1`. 
+    -  `local_cluster.topic.partitions = 2` and 
`remote_cluster.topic.partitions >=2 `: modifies nothing.

Review Comment:
   For this scenario, it is fine. The messages will be copied to the same 
partition in the remote cluster, and no message will be copied to the partition 
that is larger than `{local_cluster.topic.partitions}` in the remote cluster



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to