Athanasios Fanos created KAFKA-10857:
----------------------------------------

             Summary: Mirror Maker 2 - replication not working when deploying 
multiple instances
                 Key: KAFKA-10857
                 URL: https://issues.apache.org/jira/browse/KAFKA-10857
             Project: Kafka
          Issue Type: Bug
          Components: KafkaConnect, mirrormaker
    Affects Versions: 2.5.1, 2.6.0
            Reporter: Athanasios Fanos


We believe we are experiencing a bug when deploying Mirror Maker 2 in 
distributed mode in our environments. Replication does not work consistently 
after initial deployment and does not start working even after some time (24h+).

*Environment & replication set-up*
 * 2 regions with a separate Kafka cluster (let's call them Region A and Region 
B)
 * 3 instances of Mirror maker are deployed at the same time in Region B with 
the same configuration
 * Replication is set up to be bi-directional (regionA->regionB & 
regionB->regionA)

*Container Version*
Observed with both {{confluentinc/cp-kafka:5.5.1}} & 
{{confluentinc/cp-kafka:6.0.1}}

*Mirror maker 2 configuration*
{code:java}
clusters=regionA,regionB
regionA.bootstrap.servers=regionA-kafka:9092
regionB.bootstrap.servers=regionB-kafka:9092
regionA->regionB.enabled=true
regionA->regionB.topics=testTopic
regionB->regionA.enabled=true
regionB->regionA.topics=testTopic
sync.topic.acls.enabled=false
tasks.max=9
{code}
*Observed behavior*
 * After deploying the 3 Mirror Maker instances (at the same time), replication 
for 1 or both mirrors does not work
 ** If we scale down to a single instance of mirror maker and wait for about 5 
minutes (refresh.topics.interval.seconds?) replication starts working. After 
this scaling up to 3 correctly distributes the load between the deployed 
instances

*Expected behavior*
 * Replication should work for all configured mirrors when running in 
distributed mode
 * When starting multiple instances of Mirror Maker at the same time 
replication should work, 1 by 1 rollout should not be required

*Additional details*
 * When replication is not working, we observe that in the internal config 
topics from Mirror Maker the partitions are not assigned to the tasks, eg 
{{task.assigned.partitions}} are not set at all under the properties object.

*Workaround*
 * As a workaround, we start Mirror Maker instances 1 by 1 with some delay 
between each instance. This allows for the first instance to set-up the 
configuration in the internal topics correctly. Doing this seems to ensure that 
replication works as expected.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to