[
https://issues.apache.org/jira/browse/KAFKA-18386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Greg Harris resolved KAFKA-18386.
---------------------------------
Resolution: Won't Fix
> Mirror Maker2 Pod CrashLoopBackoff When one DC is powered off
> -------------------------------------------------------------
>
> Key: KAFKA-18386
> URL: https://issues.apache.org/jira/browse/KAFKA-18386
> Project: Kafka
> Issue Type: Bug
> Components: mirrormaker
> Affects Versions: 3.7.1
> Reporter: George Yang
> Priority: Major
> Original Estimate: 48h
> Remaining Estimate: 48h
>
> When using Kubernetes deployment with MirrorMaker v3.7.1 and deploying one
> Kafka node in each data center (DC1 and DC2), if DC1 is powered off, DC2 will
> encounter a CrashLoopBackOff error. This issue is different from the one
> described in KAFKA-17784. Please find the report log below:
> ```log
> [2025-01-01 08:05:53,432] WARN [AdminClient clientId=dc64->dc88] Connection
> to node -1 (/192.168.2.88:13399) could not be established. Node may not be
> available.
> (org.apache.kafka.clients.NetworkClient:830)[kafka-admin-client-thread |
> dc64->dc88]
> [2025-01-01 08:05:55,652] INFO [AdminClient clientId=dc64->dc88] Metadata
> update failed
> (org.apache.kafka.clients.admin.internals.AdminMetadataManager:267)[kafka-admin-client-thread
> | dc64->dc88]
> org.apache.kafka.common.errors.TimeoutException: Timed out waiting to send
> the call. Call: fetchMetadata
> [2025-01-01 08:05:55,653] INFO App info kafka.admin.client for dc64->dc88
> unregistered
> (org.apache.kafka.common.utils.AppInfoParser:88)[kafka-admin-client-thread |
> dc64->dc88]
> [2025-01-01 08:05:55,653] INFO [AdminClient clientId=dc64->dc88] Metadata
> update failed
> (org.apache.kafka.clients.admin.internals.AdminMetadataManager:267)[kafka-admin-client-thread
> | dc64->dc88]
> org.apache.kafka.common.errors.TimeoutException: Timed out waiting to send
> the call. Call: fetchMetadata
> [2025-01-01 08:05:55,653] INFO [AdminClient clientId=dc64->dc88] Timed out 1
> remaining operation(s) during close.
> (org.apache.kafka.clients.admin.KafkaAdminClient:1450)[kafka-admin-client-thread
> | dc64->dc88]
> [2025-01-01 08:05:55,657] INFO Metrics scheduler closed
> (org.apache.kafka.common.metrics.Metrics:684)[kafka-admin-client-thread |
> dc64->dc88]
> [2025-01-01 08:05:55,658] INFO Closing reporter
> org.apache.kafka.common.metrics.JmxReporter
> (org.apache.kafka.common.metrics.Metrics:688)[kafka-admin-client-thread |
> dc64->dc88]
> [2025-01-01 08:05:55,658] INFO Metrics reporters closed
> (org.apache.kafka.common.metrics.Metrics:694)[kafka-admin-client-thread |
> dc64->dc88]
> [2025-01-01 08:05:55,658] ERROR Stopping due to error
> (org.apache.kafka.connect.mirror.MirrorMaker:360)[main]
> org.apache.kafka.connect.errors.ConnectException: Failed to connect to and
> describe Kafka cluster. Check worker's broker connection and security
> properties.
> at
> org.apache.kafka.connect.runtime.WorkerConfig.lookupKafkaClusterId(WorkerConfig.java:305)
> at
> org.apache.kafka.connect.runtime.WorkerConfig.lookupKafkaClusterId(WorkerConfig.java:285)
> at
> org.apache.kafka.connect.runtime.WorkerConfig.kafkaClusterId(WorkerConfig.java:415)
> at
> org.apache.kafka.connect.mirror.MirrorMaker.addHerder(MirrorMaker.java:252)
> at java.base/java.lang.Iterable.forEach(Unknown Source)
> at
> org.apache.kafka.connect.mirror.MirrorMaker.<init>(MirrorMaker.java:158)
> at
> org.apache.kafka.connect.mirror.MirrorMaker.<init>(MirrorMaker.java:170)
> at
> org.apache.kafka.connect.mirror.MirrorMaker.<init>(MirrorMaker.java:174)
> at
> org.apache.kafka.connect.mirror.MirrorMaker.main(MirrorMaker.java:347)
> Caused by: java.util.concurrent.ExecutionException:
> org.apache.kafka.common.errors.TimeoutException: Timed out waiting for a node
> assignment. Call: listNodes
> at java.base/java.util.concurrent.CompletableFuture.reportGet(Unknown
> Source)
> at java.base/java.util.concurrent.CompletableFuture.get(Unknown
> Source)
> at
> org.apache.kafka.common.internals.KafkaFutureImpl.get(KafkaFutureImpl.java:165)
> at
> org.apache.kafka.connect.runtime.WorkerConfig.lookupKafkaClusterId(WorkerConfig.java:299)
> ... 8 more
> Caused by: org.apache.kafka.common.errors.TimeoutException: Timed out waiting
> for a node assignment. Call: listNodes
> [2025-01-01 08:05:55,687] INFO Stopped http_8083@6705fb02\{HTTP/1.1,
> (http/1.1)}{0.0.0.0:8083}
> (org.eclipse.jetty.server.AbstractConnector:383)[JettyShutdownThread]
> ```
> The configuration of mirrormaker is:
> ```
> clusters = dc64, dc88
> dc64.bootstrap.servers = 192.168.2.64:13399
> dc88.bootstrap.servers = 192.168.2.88:13399
> dc64->dc88.enabled = true
> dc64->dc88.topics = .*
> dc88->dc64.enabled = true
> dc88->dc64.topics = .*
> replication.factor=1
> tasks.max=6
> emit.checkpoints.interval.seconds=5
> dc64.producer.acks=all
> dc64.producer.batch.size=50000
> dc64.consumer.auto.offset.reset=latest
> dc88.consumer.auto.offset.reset=latest
> dc64.consumer.max.poll.interval.ms=20000
> dc88.consumer.max.poll.interval.ms=20000
> refresh.topics.enabled=true
> refresh.topics.interval.seconds=5
> refresh.groups.enabled=true
> refresh.groups.interval.seconds=5
> dedicated.mode.enable.internal.rest = true
> dc64.scheduled.rebalance.max.delay.ms=20000
> dc88.scheduled.rebalance.max.delay.ms=20000
> checkpoints.topic.replication.factor=1
> heartbeats.topic.replication.factor=1
> offset-syncs.topic.replication.factor=1
> offset.storage.replication.factor=1
> status.storage.replication.factor=1
> config.storage.replication.factor=1
> ```
--
This message was sent by Atlassian Jira
(v8.20.10#820010)