[
https://issues.apache.org/jira/browse/KAFKA-12984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andrei D updated KAFKA-12984:
-----------------------------
Comment: was deleted
(was: Here are debug logs.
The freeze happened at 15:48:23
[^logs-insights-results-kafka.numbers]
)
> Cooperative sticky assignor can get stuck with invalid SubscriptionState
> input metadata
> ---------------------------------------------------------------------------------------
>
> Key: KAFKA-12984
> URL: https://issues.apache.org/jira/browse/KAFKA-12984
> Project: Kafka
> Issue Type: Bug
> Components: consumer
> Reporter: A. Sophie Blee-Goldman
> Assignee: A. Sophie Blee-Goldman
> Priority: Blocker
> Fix For: 2.8.1, 3.0.0
>
> Attachments: image-2021-10-25-11-53-40-221.png,
> log-events-viewer-result-kafka.numbers, logs-insights-results-kafka.numbers
>
>
> Some users have reported seeing their consumer group become stuck in the
> CompletingRebalance phase when using the cooperative-sticky assignor. Based
> on the request metadata we were able to deduce that multiple consumers were
> reporting the same partition(s) in their "ownedPartitions" field of the
> consumer protocol. Since this is an invalid state, the input causes the
> cooperative-sticky assignor to detect that something is wrong and throw an
> IllegalStateException. If the consumer application is set up to simply retry,
> this will cause the group to appear to hang in the rebalance state.
> The "ownedPartitions" field is encoded based on the ConsumerCoordinator's
> SubscriptionState, which was assumed to always be up to date. However there
> may be cases where the consumer has dropped out of the group but fails to
> clear the SubscriptionState, allowing it to report some partitions as owned
> that have since been reassigned to another member.
> We should (a) fix the sticky assignment algorithm to resolve cases of
> improper input conditions by invalidating the "ownedPartitions" in cases of
> double ownership, and (b) shore up the ConsumerCoordinator logic to better
> handle rejoining the group and keeping its internal state consistent. See
> KAFKA-12983 for more details on (b)
--
This message was sent by Atlassian Jira
(v8.3.4#803005)