[ https://issues.apache.org/jira/browse/KAFKA-10857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17259959#comment-17259959 ]
Ara Zarifian commented on KAFKA-10857: -------------------------------------- I came across this ticket after reporting KAFKA-12150. It sounds like the underlying cause is similar - adverse behavior with clustered setups, specifically. > Mirror Maker 2 - replication not working when deploying multiple instances > -------------------------------------------------------------------------- > > Key: KAFKA-10857 > URL: https://issues.apache.org/jira/browse/KAFKA-10857 > Project: Kafka > Issue Type: Bug > Components: KafkaConnect, mirrormaker > Affects Versions: 2.6.0, 2.5.1 > Reporter: Athanasios Fanos > Priority: Major > > We believe we are experiencing a bug when deploying Mirror Maker 2 in > distributed mode in our environments. Replication does not work consistently > after initial deployment and does not start working even after some time > (24h+). > *Environment & replication set-up* > * 2 regions with a separate Kafka cluster (let's call them Region A and > Region B) > * 3 instances of Mirror maker are deployed at the same time in Region B with > the same configuration > * Replication is set up to be bi-directional (regionA->regionB & > regionB->regionA) > *Container Version* > Observed with both {{confluentinc/cp-kafka:5.5.1}} & > {{confluentinc/cp-kafka:6.0.1}} > *Mirror maker 2 configuration* > {code:java} > clusters=regionA,regionB > regionA.bootstrap.servers=regionA-kafka:9092 > regionB.bootstrap.servers=regionB-kafka:9092 > regionA->regionB.enabled=true > regionA->regionB.topics=testTopic > regionB->regionA.enabled=true > regionB->regionA.topics=testTopic > sync.topic.acls.enabled=false > tasks.max=9 > {code} > *Observed behavior* > * After deploying the 3 Mirror Maker instances (at the same time), > replication for 1 or both mirrors does not work > ** If we scale down to a single instance of mirror maker and wait for about > 5 minutes (refresh.topics.interval.seconds?) replication starts working. > After this scaling up to 3 correctly distributes the load between the > deployed instances > *Expected behavior* > * Replication should work for all configured mirrors when running in > distributed mode > * When starting multiple instances of Mirror Maker at the same time > replication should work, 1 by 1 rollout should not be required > *Additional details* > * When replication is not working, we observe that in the internal config > topics from Mirror Maker the partitions are not assigned to the tasks, eg > {{task.assigned.partitions}} are not set at all under the properties object. > *Workaround* > * As a workaround, we start Mirror Maker instances 1 by 1 with some delay > between each instance. This allows for the first instance to set-up the > configuration in the internal topics correctly. Doing this seems to ensure > that replication works as expected. -- This message was sent by Atlassian Jira (v8.3.4#803005)