[
https://issues.apache.org/jira/browse/KAFKA-12254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dhruvil Shah updated KAFKA-12254:
---------------------------------
Description:
`MirrorSourceConnector` implements the logic for replicating data,
configurations, and other metadata between the source and destination clusters.
This includes the tasks below:
# `refreshTopicPartitions` for syncing topics / partitions from source to
destination.
# `syncTopicConfigs` for syncing topic configurations from source to
destination.
A limitation is that `computeAndCreateTopicPartitions` creates topics with
default configurations on the destination cluster. A separate async task
`syncTopicConfigs` is responsible for syncing the topic configs. Before that
sync happens, topic configurations could be out of sync between the two
clusters.
In the worst case, this could lead to data loss eg. when we have a compacted
topic being mirrored between clusters which is incorrectly created with the
default configuration of `cleanup.policy = delete` on the destination before
the configurations are sync'd via `syncTopicConfigs`.
Here is an example of the divergence:
Source Topic:
```
Topic: foobar PartitionCount: 1 ReplicationFactor: 1 Configs:
cleanup.policy=compact,segment.bytes=1073741824
```
Destination Topic:
```
Topic: A.foobar PartitionCount: 1 ReplicationFactor: 1 Configs:
segment.bytes=1073741824
```
A safer approach is to ensure that the right configurations are set on the
destination cluster before data is replicated to it.
was:
`MirrorSourceConnector` implements the logic for replicating data,
configurations, and other metadata between the source and destination clusters.
This includes the tasks below:
# `refreshTopicPartitions` for syncing topics / partitions from source to
destination.
# `syncTopicConfigs` for syncing topic configurations from source to
destination.
A limitation is that `computeAndCreateTopicPartitions` creates topics with
default configurations on the destination cluster. A separate async task
`syncTopicConfigs` is responsible for syncing the topic configs. Before that
sync happens, topic configurations could be out of sync between the two
clusters.
In the worst case, this could lead to data loss eg. when we have a compacted
topic being mirrored between clusters which is incorrectly created with the
default configuration of `cleanup.policy = delete` on the destination before
the configurations are sync'd via `syncTopicConfigs`.
Here is an example of the divergence:
Source Topic:
```
Topic: foobar PartitionCount: 1 ReplicationFactor: 1 Configs:
cleanup.policy=compact,segment.bytes=1073741824
```
Destination Topic:
```
Topic: A.foobar PartitionCount: 1 ReplicationFactor: 1 Configs:
segment.bytes=1073741824
```
> MirrorMaker 2.0 creates destination topic with default configs
> --------------------------------------------------------------
>
> Key: KAFKA-12254
> URL: https://issues.apache.org/jira/browse/KAFKA-12254
> Project: Kafka
> Issue Type: Bug
> Reporter: Dhruvil Shah
> Priority: Major
>
> `MirrorSourceConnector` implements the logic for replicating data,
> configurations, and other metadata between the source and destination
> clusters. This includes the tasks below:
> # `refreshTopicPartitions` for syncing topics / partitions from source to
> destination.
> # `syncTopicConfigs` for syncing topic configurations from source to
> destination.
> A limitation is that `computeAndCreateTopicPartitions` creates topics with
> default configurations on the destination cluster. A separate async task
> `syncTopicConfigs` is responsible for syncing the topic configs. Before that
> sync happens, topic configurations could be out of sync between the two
> clusters.
> In the worst case, this could lead to data loss eg. when we have a compacted
> topic being mirrored between clusters which is incorrectly created with the
> default configuration of `cleanup.policy = delete` on the destination before
> the configurations are sync'd via `syncTopicConfigs`.
> Here is an example of the divergence:
> Source Topic:
> ```
> Topic: foobar PartitionCount: 1 ReplicationFactor: 1 Configs:
> cleanup.policy=compact,segment.bytes=1073741824
> ```
> Destination Topic:
> ```
> Topic: A.foobar PartitionCount: 1 ReplicationFactor: 1 Configs:
> segment.bytes=1073741824
> ```
> A safer approach is to ensure that the right configurations are set on the
> destination cluster before data is replicated to it.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)