[ 
https://issues.apache.org/jira/browse/KAFKA-13891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17629931#comment-17629931
 ] 

Jocelyn commented on KAFKA-13891:
---------------------------------

This ticket was reverted, see PR here: 
[https://github.com/apache/kafka/pull/12794]

You can find the discussion about this revert here: 
https://issues.apache.org/jira/browse/KAFKA-14016

> sync group failed with rebalanceInProgress error cause rebalance many rounds 
> in coopeartive
> -------------------------------------------------------------------------------------------
>
>                 Key: KAFKA-13891
>                 URL: https://issues.apache.org/jira/browse/KAFKA-13891
>             Project: Kafka
>          Issue Type: Bug
>          Components: clients
>    Affects Versions: 3.0.0
>            Reporter: Shawn Wang
>            Priority: Major
>             Fix For: 3.3.0, 3.2.4
>
>
> This issue was first found in 
> [KAFKA-13419|https://issues.apache.org/jira/browse/KAFKA-13419]
> But the previous PR forgot to reset generation when sync group failed with 
> rebalanceInProgress error. So the previous bug still exists and it may cause 
> consumer to rebalance many rounds before final stable.
> Here's the example ({*}bold is added{*}):
>  # consumer A joined and synced group successfully with generation 1 *( with 
> ownedPartition P1/P2 )*
>  # New rebalance started with generation 2, consumer A joined successfully, 
> but somehow, consumer A doesn't send out sync group immediately
>  # other consumer completed sync group successfully in generation 2, except 
> consumer A.
>  # After consumer A send out sync group, the new rebalance start, with 
> generation 3. So consumer A got REBALANCE_IN_PROGRESS error with sync group 
> response
>  # When receiving REBALANCE_IN_PROGRESS, we re-join the group, with 
> generation 3, with the assignment (ownedPartition) in generation 1.
>  # So, now, we have out-of-date ownedPartition sent, with unexpected results 
> happened
>  # *After the generation-3 rebalance, consumer A got P3/P4 partition. the 
> ownedPartition is ignored because of old generation.*
>  # *consumer A revoke P1/P2 and re-join to start a new round of rebalance*
>  # *if some other consumer C failed to syncGroup before consumer A's 
> joinGroup. the same issue will happens again and result in many rounds of 
> rebalance before stable*
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to