[
https://issues.apache.org/jira/browse/KAFKA-18386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17909154#comment-17909154
]
Greg Harris commented on KAFKA-18386:
-------------------------------------
Hi [~yitian998] Thanks for the ticket.
This is the expected behavior of MM2 (and Connect) when the cluster storing the
underlying Connect topics is unavailable. Connect workers use Kafka to discover
other workers, assign work, come to a consensus on the running configuration,
provide observability, etc. Without the underlying Kafka, Connect would be so
severely degraded that it is preferable to crash.
And in the MM2 use case, if one or both of the Kafka clusters is unavailable,
MM2 cannot meaningfully operate. I would suggest stopping your MM2 deployments
in advance of shutting down your Kafka deployments in order to avoid nuisance
crashes.
> Mirror Maker2 Pod CrashLoopBackoff When one DC is powered off
> -------------------------------------------------------------
>
> Key: KAFKA-18386
> URL: https://issues.apache.org/jira/browse/KAFKA-18386
> Project: Kafka
> Issue Type: Bug
> Components: mirrormaker
> Affects Versions: 3.7.1
> Reporter: George Yang
> Priority: Major
> Original Estimate: 48h
> Remaining Estimate: 48h
>
> When using Kubernetes deployment with MirrorMaker v3.7.1 and deploying one
> Kafka node in each data center (DC1 and DC2), if DC1 is powered off, DC2 will
> encounter a CrashLoopBackOff error. This issue is different from the one
> described in KAFKA-17784. Please find the report log below:
> ```log
> [2025-01-01 08:05:53,432] WARN [AdminClient clientId=dc64->dc88] Connection
> to node -1 (/192.168.2.88:13399) could not be established. Node may not be
> available.
> (org.apache.kafka.clients.NetworkClient:830)[kafka-admin-client-thread |
> dc64->dc88]
> [2025-01-01 08:05:55,652] INFO [AdminClient clientId=dc64->dc88] Metadata
> update failed
> (org.apache.kafka.clients.admin.internals.AdminMetadataManager:267)[kafka-admin-client-thread
> | dc64->dc88]
> org.apache.kafka.common.errors.TimeoutException: Timed out waiting to send
> the call. Call: fetchMetadata
> [2025-01-01 08:05:55,653] INFO App info kafka.admin.client for dc64->dc88
> unregistered
> (org.apache.kafka.common.utils.AppInfoParser:88)[kafka-admin-client-thread |
> dc64->dc88]
> [2025-01-01 08:05:55,653] INFO [AdminClient clientId=dc64->dc88] Metadata
> update failed
> (org.apache.kafka.clients.admin.internals.AdminMetadataManager:267)[kafka-admin-client-thread
> | dc64->dc88]
> org.apache.kafka.common.errors.TimeoutException: Timed out waiting to send
> the call. Call: fetchMetadata
> [2025-01-01 08:05:55,653] INFO [AdminClient clientId=dc64->dc88] Timed out 1
> remaining operation(s) during close.
> (org.apache.kafka.clients.admin.KafkaAdminClient:1450)[kafka-admin-client-thread
> | dc64->dc88]
> [2025-01-01 08:05:55,657] INFO Metrics scheduler closed
> (org.apache.kafka.common.metrics.Metrics:684)[kafka-admin-client-thread |
> dc64->dc88]
> [2025-01-01 08:05:55,658] INFO Closing reporter
> org.apache.kafka.common.metrics.JmxReporter
> (org.apache.kafka.common.metrics.Metrics:688)[kafka-admin-client-thread |
> dc64->dc88]
> [2025-01-01 08:05:55,658] INFO Metrics reporters closed
> (org.apache.kafka.common.metrics.Metrics:694)[kafka-admin-client-thread |
> dc64->dc88]
> [2025-01-01 08:05:55,658] ERROR Stopping due to error
> (org.apache.kafka.connect.mirror.MirrorMaker:360)[main]
> org.apache.kafka.connect.errors.ConnectException: Failed to connect to and
> describe Kafka cluster. Check worker's broker connection and security
> properties.
> at
> org.apache.kafka.connect.runtime.WorkerConfig.lookupKafkaClusterId(WorkerConfig.java:305)
> at
> org.apache.kafka.connect.runtime.WorkerConfig.lookupKafkaClusterId(WorkerConfig.java:285)
> at
> org.apache.kafka.connect.runtime.WorkerConfig.kafkaClusterId(WorkerConfig.java:415)
> at
> org.apache.kafka.connect.mirror.MirrorMaker.addHerder(MirrorMaker.java:252)
> at java.base/java.lang.Iterable.forEach(Unknown Source)
> at
> org.apache.kafka.connect.mirror.MirrorMaker.<init>(MirrorMaker.java:158)
> at
> org.apache.kafka.connect.mirror.MirrorMaker.<init>(MirrorMaker.java:170)
> at
> org.apache.kafka.connect.mirror.MirrorMaker.<init>(MirrorMaker.java:174)
> at
> org.apache.kafka.connect.mirror.MirrorMaker.main(MirrorMaker.java:347)
> Caused by: java.util.concurrent.ExecutionException:
> org.apache.kafka.common.errors.TimeoutException: Timed out waiting for a node
> assignment. Call: listNodes
> at java.base/java.util.concurrent.CompletableFuture.reportGet(Unknown
> Source)
> at java.base/java.util.concurrent.CompletableFuture.get(Unknown
> Source)
> at
> org.apache.kafka.common.internals.KafkaFutureImpl.get(KafkaFutureImpl.java:165)
> at
> org.apache.kafka.connect.runtime.WorkerConfig.lookupKafkaClusterId(WorkerConfig.java:299)
> ... 8 more
> Caused by: org.apache.kafka.common.errors.TimeoutException: Timed out waiting
> for a node assignment. Call: listNodes
> [2025-01-01 08:05:55,687] INFO Stopped http_8083@6705fb02\{HTTP/1.1,
> (http/1.1)}{0.0.0.0:8083}
> (org.eclipse.jetty.server.AbstractConnector:383)[JettyShutdownThread]
> ```
> The configuration of mirrormaker is:
> ```
> clusters = dc64, dc88
> dc64.bootstrap.servers = 192.168.2.64:13399
> dc88.bootstrap.servers = 192.168.2.88:13399
> dc64->dc88.enabled = true
> dc64->dc88.topics = .*
> dc88->dc64.enabled = true
> dc88->dc64.topics = .*
> replication.factor=1
> tasks.max=6
> emit.checkpoints.interval.seconds=5
> dc64.producer.acks=all
> dc64.producer.batch.size=50000
> dc64.consumer.auto.offset.reset=latest
> dc88.consumer.auto.offset.reset=latest
> dc64.consumer.max.poll.interval.ms=20000
> dc88.consumer.max.poll.interval.ms=20000
> refresh.topics.enabled=true
> refresh.topics.interval.seconds=5
> refresh.groups.enabled=true
> refresh.groups.interval.seconds=5
> dedicated.mode.enable.internal.rest = true
> dc64.scheduled.rebalance.max.delay.ms=20000
> dc88.scheduled.rebalance.max.delay.ms=20000
> checkpoints.topic.replication.factor=1
> heartbeats.topic.replication.factor=1
> offset-syncs.topic.replication.factor=1
> offset.storage.replication.factor=1
> status.storage.replication.factor=1
> config.storage.replication.factor=1
> ```
--
This message was sent by Atlassian Jira
(v8.20.10#820010)