[ https://issues.apache.org/jira/browse/KAFKA-15891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17898684#comment-17898684 ]
Will Perlichek commented on KAFKA-15891: ---------------------------------------- Update: I am still digging into this. Suspect so-called "zombie sink tasks" are responsible for at least some of the flakiness of this test based upon me digging into the codebase and examining the CI logs. To be more specific with my update, in the flaky CI logs I see message below (from https://github.com/apache/kafka/blob/cc20e7847450d7a3bd5af85f821697e17761cc60/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/Worker.java#L1514) {code:java} Failed to reset consumer group offsets for connector {connName} because its tasks may not have stopped completely or the connector might have been resumed before the offset reset request could be completed. If the connector is in a stopped state, this operation can be safely retried. If it continues to fail, restarting the Connect cluster may be necessary to resolve potential zombie sink tasks.{code} This strongly suggests that there is a zombie sink task that was created during a shaky CI build because we definitely STOPPED the connector and DID NOT resume it in the test. This zombie sink task prevented the offset API from resetting the consumer offsets, because we can't do that if any zombie tasks exist, we will get a group not empty exception. My early approach to address this part of the flakiness might be the restart the connect cluster as the error message suggests, or it could be to find these tasks and shut them down. > Flaky test: testResetSinkConnectorOffsetsOverriddenConsumerGroupId – > org.apache.kafka.connect.integration.OffsetsApiIntegrationTest > ----------------------------------------------------------------------------------------------------------------------------------- > > Key: KAFKA-15891 > URL: https://issues.apache.org/jira/browse/KAFKA-15891 > Project: Kafka > Issue Type: Bug > Components: connect > Reporter: Apoorv Mittal > Assignee: Will Perlichek > Priority: Major > Labels: flaky-test > > h4. Error > org.opentest4j.AssertionFailedError: Condition not met within timeout 30000. > Sink connector consumer group offsets should catch up to the topic end > offsets ==> expected: <true> but was: <false> > h4. Stacktrace > org.opentest4j.AssertionFailedError: Condition not met within timeout 30000. > Sink connector consumer group offsets should catch up to the topic end > offsets ==> expected: <true> but was: <false> > at > app//org.junit.jupiter.api.AssertionFailureBuilder.build(AssertionFailureBuilder.java:151) > at > app//org.junit.jupiter.api.AssertionFailureBuilder.buildAndThrow(AssertionFailureBuilder.java:132) > at app//org.junit.jupiter.api.AssertTrue.failNotTrue(AssertTrue.java:63) > at app//org.junit.jupiter.api.AssertTrue.assertTrue(AssertTrue.java:36) > at app//org.junit.jupiter.api.Assertions.assertTrue(Assertions.java:210) > at > app//org.apache.kafka.test.TestUtils.lambda$waitForCondition$3(TestUtils.java:331) > at > app//org.apache.kafka.test.TestUtils.retryOnExceptionWithTimeout(TestUtils.java:379) > at app//org.apache.kafka.test.TestUtils.waitForCondition(TestUtils.java:328) > at app//org.apache.kafka.test.TestUtils.waitForCondition(TestUtils.java:312) > at app//org.apache.kafka.test.TestUtils.waitForCondition(TestUtils.java:302) > at > app//org.apache.kafka.connect.integration.OffsetsApiIntegrationTest.verifyExpectedSinkConnectorOffsets(OffsetsApiIntegrationTest.java:917) > at > app//org.apache.kafka.connect.integration.OffsetsApiIntegrationTest.resetAndVerifySinkConnectorOffsets(OffsetsApiIntegrationTest.java:725) -- This message was sent by Atlassian Jira (v8.20.10#820010)