C0urante opened a new pull request, #16757: URL: https://github.com/apache/kafka/pull/16757
We're still seeing some flaky test failures for the `OffsetsApiIntegrationTest` suite, but in a much smaller subset of cases: - testAlterSinkConnectorOffsetsDifferentKafkaClusterTargeted: [1](https://ge.apache.org/s/ydrj6vsi2t5mw/tests/task/:connect:runtime:test/details/org.apache.kafka.connect.integration.OffsetsApiIntegrationTest/testAlterSinkConnectorOffsetsDifferentKafkaClusterTargeted()?top-execution=1), [2](https://ge.apache.org/s/ndddgm7c6ma3c/tests/task/:connect:runtime:test/details/org.apache.kafka.connect.integration.OffsetsApiIntegrationTest/testAlterSinkConnectorOffsetsDifferentKafkaClusterTargeted()?top-execution=1) - testResetSinkConnectorOffsetsDifferentKafkaClusterTargeted: [1](https://ge.apache.org/s/ydrj6vsi2t5mw/tests/task/:connect:runtime:test/details/org.apache.kafka.connect.integration.OffsetsApiIntegrationTest/testResetSinkConnectorOffsetsDifferentKafkaClusterTargeted()?top-execution=1), [2](https://ge.apache.org/s/ndddgm7c6ma3c/tests/task/:connect:runtime:test/details/org.apache.kafka.connect.integration.OffsetsApiIntegrationTest/testResetSinkConnectorOffsetsDifferentKafkaClusterTargeted()?top-execution=1) - testGetSinkConnectorOffsetsDifferentKafkaClusterTargeted: [1](https://ge.apache.org/s/ydrj6vsi2t5mw/tests/task/:connect:runtime:test/details/org.apache.kafka.connect.integration.OffsetsApiIntegrationTest/testGetSinkConnectorOffsetsDifferentKafkaClusterTargeted()?top-execution=1) After examining log files, it looks like this is a genuine case of timeouts being too low; in the three failures I examined, the sink connector's consumer group was never able to form or handle offset commits because the separate Kafka cluster it targeted didn't have a group coordinator and was still creating the internal offsets topic. Instead of just increasing timeouts, I'd like to add an enhanced cluster readiness check for our `EmbeddedKafkaCluster` class that's automatically performed on startup. This way, not only do we give the test cases above more time to run (assuming that a significant amount of their runtime is currently taken up by bringing up the separate Kafka cluster), we also have better insight into whether future failures are caused by broker startup issues or something else. ### Committer Checklist (excluded from commit message) - [ ] Verify design and implementation - [ ] Verify test coverage and CI build status - [ ] Verify documentation (including upgrade notes) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
