[ 
https://issues.apache.org/jira/browse/KAFKA-18310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17915738#comment-17915738
 ] 

PoAn Yang commented on KAFKA-18310:
-----------------------------------

The root cause of this Jira is like following:

AbstractCoordinator#joinGroupIfNeeded calls 
ConsumerNetworkClient#poll(RequestFuture, Timer) [0]. It calls another method 
with disableWakeup = false. If joinFuture is not done, ConsumerNetworkClient 
does poll multiple times before timer is expired [1]. If joinFuture finishes 
too fast, the SyncGroupRequest can't be handled in 
AbstractCoordinator#ensureActiveGroup, so it doesn't throw WakeupException. I 
change ConsumerNetworkClient#poll(RequestFuture, Timer, boolean) like following 
and I can reproduce flaky frequently.
{noformat}
public boolean poll(RequestFuture<?> future, Timer timer, boolean 
disableWakeup) {
    do {
        poll(timer, future, disableWakeup);
        try {
            System.err.println("sleeping.....");
            Thread.sleep(1000L);
        } catch (InterruptedException e) {
            e.printStackTrace();
        }
    } while (!future.isDone() && timer.notExpired());
    return future.isDone();
}{noformat}
[0] 
[https://github.com/apache/kafka/blob/3d49159c841e7653e3951af4ffc3524d17339295/clients/src/main/java/org/apache/kafka/clients/consumer/internals/AbstractCoordinator.java#L480-L481]
[1] 
[https://github.com/apache/kafka/blob/3d49159c841e7653e3951af4ffc3524d17339295/clients/src/main/java/org/apache/kafka/clients/consumer/internals/ConsumerNetworkClient.java#L230-L235]

> Flaky AbstractCoordinatorTest
> -----------------------------
>
>                 Key: KAFKA-18310
>                 URL: https://issues.apache.org/jira/browse/KAFKA-18310
>             Project: Kafka
>          Issue Type: Test
>    Affects Versions: 4.0.0
>            Reporter: Andrew Schofield
>            Assignee: PoAn Yang
>            Priority: Major
>
> Three tests are flaky with about 5% failure rate on trunk.
> * testWakeupAfterSyncGroupReceived
> * testWakeupAfterSyncGroupReceivedExternalCompletion
> * testWakeupAfterSyncGroupSentExternalCompletion



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to