[ https://issues.apache.org/jira/browse/KAFKA-12984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17434705#comment-17434705 ]
Luke Chen commented on KAFKA-12984: ----------------------------------- > a public interface/abstract class like CooperativeConsumerPartitionAssignor > that users can implement/extend to for a custom cooperative assignor, which > at the very least stores the generation like we do in the > CooperativeStickyAssignor and makes it available for the user as well as to > the ConsumerCoordinator for it to know when to invalidate a member's > ownedPartitions during this validation step – does that make sense? WDYT? --> Sounds good to me. Or maybe we can put onto the ConsumerProtocolSubscription protocol to add a "generation" field. > Cooperative sticky assignor can get stuck with invalid SubscriptionState > input metadata > --------------------------------------------------------------------------------------- > > Key: KAFKA-12984 > URL: https://issues.apache.org/jira/browse/KAFKA-12984 > Project: Kafka > Issue Type: Bug > Components: consumer > Reporter: A. Sophie Blee-Goldman > Assignee: A. Sophie Blee-Goldman > Priority: Blocker > Fix For: 2.8.1, 3.0.0 > > Attachments: image-2021-10-25-11-53-40-221.png, > log-events-viewer-result-kafka.numbers, logs-insights-results-kafka.csv, > logs-insights-results-kafka.numbers > > > Some users have reported seeing their consumer group become stuck in the > CompletingRebalance phase when using the cooperative-sticky assignor. Based > on the request metadata we were able to deduce that multiple consumers were > reporting the same partition(s) in their "ownedPartitions" field of the > consumer protocol. Since this is an invalid state, the input causes the > cooperative-sticky assignor to detect that something is wrong and throw an > IllegalStateException. If the consumer application is set up to simply retry, > this will cause the group to appear to hang in the rebalance state. > The "ownedPartitions" field is encoded based on the ConsumerCoordinator's > SubscriptionState, which was assumed to always be up to date. However there > may be cases where the consumer has dropped out of the group but fails to > clear the SubscriptionState, allowing it to report some partitions as owned > that have since been reassigned to another member. > We should (a) fix the sticky assignment algorithm to resolve cases of > improper input conditions by invalidating the "ownedPartitions" in cases of > double ownership, and (b) shore up the ConsumerCoordinator logic to better > handle rejoining the group and keeping its internal state consistent. See > KAFKA-12983 for more details on (b) -- This message was sent by Atlassian Jira (v8.3.4#803005)