wehbi created KAFKA-10654:
-----------------------------
Summary: connector has failed, but worker status was ok
Key: KAFKA-10654
URL: https://issues.apache.org/jira/browse/KAFKA-10654
Project: Kafka
Issue Type: Bug
Components: KafkaConnect
Affects Versions: 2.1.1
Environment: Kafka distib : confluent CE
kafka version:kafka_2.12-5.4.0-ccs.jar
Reporter: wehbi
Hello
We are using Kafka Mongo sink connector (please see below configuration), and
we have multiple connectors on multiple topics.
lately one of the connector has stopped to work, but the others continue to
operate normally within the same worker. Looking into the connector logs (see
extract below), we can observe that the Kafka topic leader was not available.
the worker service status was running (systemctl service)
Restarting the workers service has solved the problem.
why the connector was not able to recover automatically ?
how can we monitor and detect this failure ?
for information:
Kafka distib : confluent CE
kafka version:kafka_2.12-5.4.0-ccs.jar
{"class":"com.mongodb.kafka.connect.MongoSinkConnector","type":"sink","version":"1.0.1"}
we have a distributed workers.
I've checked the task status before restarting the worker, and it was saying
that it is running (not failed). and also tried pause/resume for the task but
it didn't do any thing.
we are already monitoring the connector metrics (using prometheus/graphana) and
they never detected the task failure. All metrics are indicating that all is
fine.
----------------------------------- connector logs -----------------
[2020-09-25 11:53:52,352] WARN [Consumer
clientId=connector-consumer-adherent-sink-0, groupId=connect-adherent-sink]
Received unknown topic or partition error in fetch for partition
RAMOWNER.ADHERENT-1 (org.apache.kafka.clients.consumer.internals.Fetcher:1246)
[2020-09-25 11:53:52,353] WARN [Consumer
clientId=connector-consumer-adherent-sink-0, groupId=connect-adherent-sink]
Received unknown topic or partition error in fetch for partition
RAMOWNER.ADHERENT-4 (org.apache.kafka.clients.consumer.internals.Fetcher:1246)
[2020-09-25 11:53:52,353] WARN [Consumer
clientId=connector-consumer-adherent-sink-0, groupId=connect-adherent-sink]
Received unknown topic or partition error in fetch for partition
RAMOWNER.ADHERENT-7 (org.apache.kafka.clients.consumer.internals.Fetcher:1246)
[2020-09-25 11:53:52,365] WARN [Consumer
clientId=connector-consumer-adherent-sink-0, groupId=connect-adherent-sink]
Error while fetching metadata with correlation id 20822125 :
\{RAMOWNER.ADHERENT=LEADER_NOT_AVAILABLE}
(org.apache.kafka.clients.NetworkClient:1063)
[2020-09-25 11:53:52,374] INFO [Consumer
clientId=connector-consumer-adherent-sink-0, groupId=connect-adherent-sink]
Revoke previously assigned partitions RAMOWNER.ADHERENT-3, RAMOWNER.ADHERENT-2,
RAMOWNER.ADHERENT-1, RAMOWNER.ADHERENT-0, RAMOWNER.ADHERENT-7,
RAMOWNER.ADHERENT-6, RAMOWNER.ADHERENT-5, RAMOWNER.ADHERENT-4
(org.apache.kafka.clients.consumer.internals.ConsumerCoordinator:286)
[2020-09-25 11:53:52,472] WARN [Consumer
clientId=connector-consumer-adherent-sink-0, groupId=connect-adherent-sink]
Error while fetching metadata with correlation id 20822127 :
\{RAMOWNER.ADHERENT=LEADER_NOT_AVAILABLE}
(org.apache.kafka.clients.NetworkClient:1063)
[2020-09-25 11:53:52,472] WARN [Consumer
clientId=connector-consumer-adherent-sink-0, groupId=connect-adherent-sink] The
following subscribed topics are not assigned to any members:
[RAMOWNER.ADHERENT]
(org.apache.kafka.clients.consumer.internals.ConsumerCoordinator:570)
[2020-09-25 11:53:52,597] WARN [Consumer
clientId=connector-consumer-adherent-sink-0, groupId=connect-adherent-sink]
Error while fetching metadata with correlation id 20822129 :
\{RAMOWNER.ADHERENT=LEADER_NOT_AVAILABLE}
(org.apache.kafka.clients.NetworkClient:1063)
topics = [RAMOWNER.ADHERENT]
topics = [RAMOWNER.ADHERENT]
--
This message was sent by Atlassian Jira
(v8.3.4#803005)