[ 
https://issues.apache.org/jira/browse/KAFKA-15891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17898684#comment-17898684
 ] 

Will Perlichek commented on KAFKA-15891:
----------------------------------------

Update: 

I am still digging into this. Suspect so-called "zombie sink tasks" are 
responsible for at least some of the flakiness of this test based upon me 
digging into the codebase and examining the CI logs. 

To be more specific with my update, in the flaky CI logs I see message below 
(from 
https://github.com/apache/kafka/blob/cc20e7847450d7a3bd5af85f821697e17761cc60/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/Worker.java#L1514)


{code:java}
Failed to reset consumer group offsets for connector {connName} because its 
tasks may not have stopped completely or the connector might have been resumed 
before the offset reset request could be completed. If the connector is in a 
stopped state, this operation can be safely retried. If it continues to fail, 
restarting the Connect cluster may be necessary to resolve potential zombie 
sink tasks.{code}

This strongly suggests that there is a zombie sink task that was created during 
a shaky CI build because we definitely STOPPED the connector and DID NOT resume 
it in the test. This zombie sink task prevented the offset API from resetting 
the consumer offsets, because we can't do that if any zombie tasks exist, we 
will get a group not empty exception. 

My early approach to address this part of the flakiness might be the restart 
the connect cluster as the error message suggests, or it could be to find these 
tasks and shut them down.

> Flaky test: testResetSinkConnectorOffsetsOverriddenConsumerGroupId – 
> org.apache.kafka.connect.integration.OffsetsApiIntegrationTest
> -----------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: KAFKA-15891
>                 URL: https://issues.apache.org/jira/browse/KAFKA-15891
>             Project: Kafka
>          Issue Type: Bug
>          Components: connect
>            Reporter: Apoorv Mittal
>            Assignee: Will Perlichek
>            Priority: Major
>              Labels: flaky-test
>
> h4. Error
> org.opentest4j.AssertionFailedError: Condition not met within timeout 30000. 
> Sink connector consumer group offsets should catch up to the topic end 
> offsets ==> expected: <true> but was: <false>
> h4. Stacktrace
> org.opentest4j.AssertionFailedError: Condition not met within timeout 30000. 
> Sink connector consumer group offsets should catch up to the topic end 
> offsets ==> expected: <true> but was: <false>
>  at 
> app//org.junit.jupiter.api.AssertionFailureBuilder.build(AssertionFailureBuilder.java:151)
>  at 
> app//org.junit.jupiter.api.AssertionFailureBuilder.buildAndThrow(AssertionFailureBuilder.java:132)
>  at app//org.junit.jupiter.api.AssertTrue.failNotTrue(AssertTrue.java:63)
>  at app//org.junit.jupiter.api.AssertTrue.assertTrue(AssertTrue.java:36)
>  at app//org.junit.jupiter.api.Assertions.assertTrue(Assertions.java:210)
>  at 
> app//org.apache.kafka.test.TestUtils.lambda$waitForCondition$3(TestUtils.java:331)
>  at 
> app//org.apache.kafka.test.TestUtils.retryOnExceptionWithTimeout(TestUtils.java:379)
>  at app//org.apache.kafka.test.TestUtils.waitForCondition(TestUtils.java:328)
>  at app//org.apache.kafka.test.TestUtils.waitForCondition(TestUtils.java:312)
>  at app//org.apache.kafka.test.TestUtils.waitForCondition(TestUtils.java:302)
>  at 
> app//org.apache.kafka.connect.integration.OffsetsApiIntegrationTest.verifyExpectedSinkConnectorOffsets(OffsetsApiIntegrationTest.java:917)
>  at 
> app//org.apache.kafka.connect.integration.OffsetsApiIntegrationTest.resetAndVerifySinkConnectorOffsets(OffsetsApiIntegrationTest.java:725)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to