[ 
https://issues.apache.org/jira/browse/KAFKA-14072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mickael Maison resolved KAFKA-14072.
------------------------------------
    Fix Version/s: 3.5.0
       Resolution: Fixed

This looks like it's the same issue as KAFKA-14545

> Crashed MirrorCheckpointConnector appears as running in REST API
> ----------------------------------------------------------------
>
>                 Key: KAFKA-14072
>                 URL: https://issues.apache.org/jira/browse/KAFKA-14072
>             Project: Kafka
>          Issue Type: Bug
>          Components: KafkaConnect, mirrormaker
>    Affects Versions: 3.1.0
>            Reporter: Mickael Maison
>            Priority: Major
>             Fix For: 3.5.0
>
>
> In one cluster I had a partially crashed MirrorCheckpointConnector instance. 
> It had stopped mirroring offsets and emitting metrics completely but the 
> connector and its single task were still reporting as running in the REST API.
> Looking at the logs, I found this stacktrace:
> {code:java}
> java.lang.NullPointerException
>       at 
> org.apache.kafka.connect.mirror.MirrorCheckpointTask.checkpoint(MirrorCheckpointTask.java:187)
>       at 
> org.apache.kafka.connect.mirror.MirrorCheckpointTask.lambda$checkpointsForGroup$2(MirrorCheckpointTask.java:171)
>       at 
> java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:195)
>       at 
> java.base/java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:177)
>       at 
> java.base/java.util.HashMap$EntrySpliterator.forEachRemaining(HashMap.java:1764)
>       at 
> java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484)
>       at 
> java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:474)
>       at 
> java.base/java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:913)
>       at 
> java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
>       at 
> java.base/java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:578)
>       at 
> org.apache.kafka.connect.mirror.MirrorCheckpointTask.checkpointsForGroup(MirrorCheckpointTask.java:173)
>       at 
> org.apache.kafka.connect.mirror.MirrorCheckpointTask.sourceRecordsForGroup(MirrorCheckpointTask.java:157)
>       at 
> org.apache.kafka.connect.mirror.MirrorCheckpointTask.poll(MirrorCheckpointTask.java:139)
>       at 
> org.apache.kafka.connect.runtime.WorkerSourceTask.poll(WorkerSourceTask.java:291)
>       at 
> org.apache.kafka.connect.runtime.WorkerSourceTask.execute(WorkerSourceTask.java:248)
>       at 
> org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:186)
>       at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:241)
>       at 
> java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
>       at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
>       at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>       at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>       at java.base/java.lang.Thread.run(Thread.java:829)
> WARN [prod-source->sc-prod-target.MirrorCheckpointConnector|task-0] Failure 
> polling consumer state for checkpoints. 
> (org.apache.kafka.connect.mirror.MirrorCheckpointTask) 
> [task-thread-prod-source->sc-prod-target.MirrorCheckpointConnector-0]
> {code}
> Not sure if it's related but prior this exception, there's quite a lot of:
> {code:java}
> ERROR [prod-source->sc-prod-target.MirrorCheckpointConnector|task-0] 
> WorkerSourceTask{id=prod-source->sc-prod-target.MirrorCheckpointConnector-0} 
> failed to send record to prod-source.checkpoints.internal:  
> (org.apache.kafka.connect.runtime.WorkerSourceTask) 
> [kafka-producer-network-thread | 
> connector-producer-prod-source->sc-prod-target.MirrorCheckpointConnector-0]
> org.apache.kafka.common.KafkaException: Producer is closed forcefully.
>       at 
> org.apache.kafka.clients.producer.internals.RecordAccumulator.abortBatches(RecordAccumulator.java:760)
>       at 
> org.apache.kafka.clients.producer.internals.RecordAccumulator.abortIncompleteBatches(RecordAccumulator.java:747)
>       at 
> org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:283)
>       at java.base/java.lang.Thread.run(Thread.java:829)
> {code}
> and some users had started consumers in the target cluster hence causing 
> these log lines:
> {code:java}
> ERROR [prod-source->sc-prod-target.MirrorCheckpointConnector|task-0] 
> [AdminClient clientId=adminclient-137] OffsetCommit request for group id 
> <GROUP_ID> and partition <TP> failed due to unexpected error 
> UNKNOWN_MEMBER_ID. 
> (org.apache.kafka.clients.admin.internals.AlterConsumerGroupOffsetsHandler) 
> [kafka-admin-client-thread | adminclient-137]
> {code}
> Unfortunately I don't have the full history, so it's unclear if this happened 
> while stopping but the connector stayed in this state for several hours until 
> it was explicitly deleted via the REST API.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to