C0urante opened a new pull request, #16585: URL: https://github.com/apache/kafka/pull/16585
This test case has become flaky since https://github.com/apache/kafka/pull/16477 was merged. For a quick fix, we can give workers time to discover that the Kafka cluster has gone down in order for the health check endpoint to be accurate. Ideally, this would not be necessary and we'd always detect that the Kafka cluster is down at the top of our tick loop [here](https://github.com/apache/kafka/blob/0ada8fac6869cad8ac33a79032cf5d57bfa2a3ea/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/distributed/DistributedHerder.java#L419). However, with some rare-but-not-impossible timing, that method may return without sending any requests to the group coordinator. This is itself probably indicative of a bug, and if reviewers agree, we can file a Jira ticket for it. But, because this possibly-buggy behavior is not newly-introduced and the cause of failure for our tests is simply a failure to account for that existing behavior in newly-introduced logic, we should not block on a fix for that behavior before modifying this test case to prevent flakiness. ### Committer Checklist (excluded from commit message) - [ ] Verify design and implementation - [ ] Verify test coverage and CI build status - [ ] Verify documentation (including upgrade notes) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
