[ 
https://issues.apache.org/jira/browse/KAFKA-18569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17913867#comment-17913867
 ] 

Kirk True commented on KAFKA-18569:
-----------------------------------

{quote}An option could be to consider "signal close" to the 
CoordinatorRequestManager after the consumer.close completes commit/leave, so 
that it does not generate any more requests on poll (similar to what is already 
done for the CommitRequestManager with the CommitOnCloseEvent
{quote}

I noticed that {{CoordinatorRequestManager}} uses the default, no-op 
implementation of 
[{{RequestManager.pollOnClose()}}|https://github.com/apache/kafka/blob/trunk/clients/src/main/java/org/apache/kafka/clients/consumer/internals/RequestManager.java#L48-L66].
 Since 
[{{ConsumerNetworkThread.cleanup()}}|https://github.com/apache/kafka/blob/trunk/clients/src/main/java/org/apache/kafka/clients/consumer/internals/ConsumerNetworkThread.java#L336-L351]
 calls {{runAtClose()}} before {{sendUnsentRequests()}}, adding an 
implementation of {{CoordinatorRequestManager.pollOnClose()}} that halts 
further {{FindCoordinator}} requests from being generated would then prevent 
{{sendUnsentRequests()}} from having any {{FindCoordinator}} requests to send.

I don't know for sure if that would work, but it could be worth a try.

> New consumer close may wait on unneeded FindCoordinator
> -------------------------------------------------------
>
>                 Key: KAFKA-18569
>                 URL: https://issues.apache.org/jira/browse/KAFKA-18569
>             Project: Kafka
>          Issue Type: Bug
>          Components: clients, consumer
>            Reporter: Lianet Magrans
>            Assignee: TengYao Chi
>            Priority: Blocker
>              Labels: kip-848-client-support
>             Fix For: 4.0.0
>
>
> A flaky test revealed that the new consumer close may wait for a 
> FindCoordinator unsent request to go out when closing the consumer, even 
> after the commit/leaveGroup stages of close are done.
> This could happen because the CoordinatorRequestManager poll continues to 
> attempt FindCoordinator if the coordinator is unknown , even if this happens 
> during consumer close, after the consumer has completed the commit/leave 
> attempts (which are the only steps in close that require a coordinator), and 
> before the network shutdown that stops polling managers here
> [https://github.com/apache/kafka/blob/5c20aa187aa8f51af4270d7d1b0db4963b0cd10b/clients/src/main/java/org/apache/kafka/clients/consumer/internals/AsyncKafkaConsumer.java#L1343]
>  
> If the unneeded FindCoordinator is generated and the brokers are down (like 
> could happen in the flaky test), the consumer would wait for that request 
> unnecessarily here  
> [https://github.com/apache/kafka/blob/5c20aa187aa8f51af4270d7d1b0db4963b0cd10b/clients/src/main/java/org/apache/kafka/clients/consumer/internals/ConsumerNetworkThread.java#L327]
> I expect we shouldn't block the close on a FindCoordinator request if the 
> consumer already completed the commit/leave attempts.
> An option could be to consider "signal close" to the 
> CoordinatorRequestManager after the consumer.close completes commit/leave, so 
> that it does not generate any more requests on poll (similar to what is 
> already done for the CommitRequestManager with the CommitOnCloseEvent
> [https://github.com/apache/kafka/blob/5c20aa187aa8f51af4270d7d1b0db4963b0cd10b/clients/src/main/java/org/apache/kafka/clients/consumer/internals/AsyncKafkaConsumer.java#L1335]
>  
> This fix should allow to enable this test for the new consumer reliably.
> https://github.com/apache/kafka/blob/5c20aa187aa8f51af4270d7d1b0db4963b0cd10b/core/src/test/scala/integration/kafka/api/ConsumerBounceTest.scala#L404
> Without the fix, the test is flaky (fails locally after a few repeated runs, 
> fails in CI).
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to