[ 
https://issues.apache.org/jira/browse/KAFKA-12984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17434705#comment-17434705
 ] 

Luke Chen commented on KAFKA-12984:
-----------------------------------

> a public interface/abstract class like CooperativeConsumerPartitionAssignor 
> that users can implement/extend to for a custom cooperative assignor, which 
> at the very least stores the generation like we do in the 
> CooperativeStickyAssignor and makes it available for the user as well as to 
> the ConsumerCoordinator for it to know when to invalidate a member's 
> ownedPartitions during this validation step – does that make sense? WDYT?

 --> Sounds good to me. Or maybe we can put onto the 
ConsumerProtocolSubscription protocol to add a "generation" field.

> Cooperative sticky assignor can get stuck with invalid SubscriptionState 
> input metadata
> ---------------------------------------------------------------------------------------
>
>                 Key: KAFKA-12984
>                 URL: https://issues.apache.org/jira/browse/KAFKA-12984
>             Project: Kafka
>          Issue Type: Bug
>          Components: consumer
>            Reporter: A. Sophie Blee-Goldman
>            Assignee: A. Sophie Blee-Goldman
>            Priority: Blocker
>             Fix For: 2.8.1, 3.0.0
>
>         Attachments: image-2021-10-25-11-53-40-221.png, 
> log-events-viewer-result-kafka.numbers, logs-insights-results-kafka.csv, 
> logs-insights-results-kafka.numbers
>
>
> Some users have reported seeing their consumer group become stuck in the 
> CompletingRebalance phase when using the cooperative-sticky assignor. Based 
> on the request metadata we were able to deduce that multiple consumers were 
> reporting the same partition(s) in their "ownedPartitions" field of the 
> consumer protocol. Since this is an invalid state, the input causes the 
> cooperative-sticky assignor to detect that something is wrong and throw an 
> IllegalStateException. If the consumer application is set up to simply retry, 
> this will cause the group to appear to hang in the rebalance state.
> The "ownedPartitions" field is encoded based on the ConsumerCoordinator's 
> SubscriptionState, which was assumed to always be up to date. However there 
> may be cases where the consumer has dropped out of the group but fails to 
> clear the SubscriptionState, allowing it to report some partitions as owned 
> that have since been reassigned to another member.
> We should (a) fix the sticky assignment algorithm to resolve cases of 
> improper input conditions by invalidating the "ownedPartitions" in cases of 
> double ownership, and (b) shore up the ConsumerCoordinator logic to better 
> handle rejoining the group and keeping its internal state consistent. See 
> KAFKA-12983 for more details on (b)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to