[jira] [Created] (KAFKA-13335) Upgrading connect from 2.7.0 to 2.8.0 causes worker instability

John Gray (Jira) Wed, 29 Sep 2021 06:31:30 -0700

John Gray created KAFKA-13335:
---------------------------------

             Summary: Upgrading connect from 2.7.0 to 2.8.0 causes worker 
instability
                 Key: KAFKA-13335
                 URL: https://issues.apache.org/jira/browse/KAFKA-13335
             Project: Kafka
          Issue Type: Bug
          Components: KafkaConnect
    Affects Versions: 2.8.0
            Reporter: John Gray
         Attachments: image-2021-09-29-09-15-18-172.png


After recently upgrading our connect cluster to 2.8.0 (via strimzi+Kubernetes, 
brokers are still on 2.7.0), I am noticing that the cluster is struggling to 
stabilize. Connectors are being unassigned/reassigned/duplicated continuously, 
and never settling back down. A downgrade back to 2.7.0 fixes things 
immediately. I have attached a picture of our Grafana dashboards showing some 
metrics. We have a connect cluster with 4 nodes, trying to maintain about 1000 
connectors, each connector with a maxTask of 1. 

We are noticing a slow increase in memory usage with big random peaks of tasks 
counts and thread counts.

I do also notice over the course of letting 2.8.0 run a huge increase in logs 
stating that {code}ERROR Graceful stop of task (task name here) failed.{code}, 
but the logs do not seem to indicate a reason. The connector appears to be 
stopped only seconds after its creation. It appears to only affect our source 
connectors. These logs stop after downgrading back to 2.7.0.

I am not sure what could be causing this, any insight would be appreciated! 
I do notice Kafka 2.7.1/2.8.0 contains a bugfix related to connect rebalances 
(KAFKA-10413). Is that fix potentially causing instability? 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (KAFKA-13335) Upgrading connect from 2.7.0 to 2.8.0 causes worker instability

Reply via email to