[
https://issues.apache.org/jira/browse/KAFKA-18569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17916116#comment-17916116
]
Lianet Magrans commented on KAFKA-18569:
----------------------------------------
Hey [~frankvicky] , I added 2 subtasks here just to point out the tests that we
should be able to enable with the fix. Let's have it all in the same PR if
possible I would say, it will be a good validation of the fix. Thanks!
> New consumer close may wait on unneeded FindCoordinator
> -------------------------------------------------------
>
> Key: KAFKA-18569
> URL: https://issues.apache.org/jira/browse/KAFKA-18569
> Project: Kafka
> Issue Type: Bug
> Components: clients, consumer
> Reporter: Lianet Magrans
> Assignee: TengYao Chi
> Priority: Blocker
> Labels: kip-848-client-support
> Fix For: 4.0.0
>
>
> A flaky test revealed that the new consumer close may wait for a
> FindCoordinator unsent request to go out when closing the consumer, even
> after the commit/leaveGroup stages of close are done.
> This could happen because the CoordinatorRequestManager poll continues to
> attempt FindCoordinator if the coordinator is unknown , even if this happens
> during consumer close, after the consumer has completed the commit/leave
> attempts (which are the only steps in close that require a coordinator), and
> before the network shutdown that stops polling managers here
> [https://github.com/apache/kafka/blob/5c20aa187aa8f51af4270d7d1b0db4963b0cd10b/clients/src/main/java/org/apache/kafka/clients/consumer/internals/AsyncKafkaConsumer.java#L1343]
>
> If the unneeded FindCoordinator is generated and the brokers are down (like
> could happen in the flaky test), the consumer would wait for that request
> unnecessarily here
> [https://github.com/apache/kafka/blob/5c20aa187aa8f51af4270d7d1b0db4963b0cd10b/clients/src/main/java/org/apache/kafka/clients/consumer/internals/ConsumerNetworkThread.java#L327]
> I expect we shouldn't block the close on a FindCoordinator request if the
> consumer already completed the commit/leave attempts.
> An option could be to consider "signal close" to the
> CoordinatorRequestManager after the consumer.close completes commit/leave, so
> that it does not generate any more requests on poll (similar to what is
> already done for the CommitRequestManager with the CommitOnCloseEvent
> [https://github.com/apache/kafka/blob/5c20aa187aa8f51af4270d7d1b0db4963b0cd10b/clients/src/main/java/org/apache/kafka/clients/consumer/internals/AsyncKafkaConsumer.java#L1335]
>
> This fix should allow to enable this test for the new consumer reliably.
> https://github.com/apache/kafka/blob/5c20aa187aa8f51af4270d7d1b0db4963b0cd10b/core/src/test/scala/integration/kafka/api/ConsumerBounceTest.scala#L404
> Without the fix, the test is flaky (fails locally after a few repeated runs,
> fails in CI).
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)