[ https://issues.apache.org/jira/browse/KAFKA-12984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17433653#comment-17433653 ]
Andrei D commented on KAFKA-12984: ---------------------------------- The entire group used 2.8.1 Kafka-client and 'CooperativeStickyAssignor'. Here are broker logs: we see that broker skipped assignment for generations 10-12 since ConsumerCoordinator was stucked on it's side !image-2021-10-25-11-53-40-221.png! and here are logs from consumers for the same timeframe: {code:java} 2021-10-20 10:14:27.878 ERROR {spanId=, traceId=} [org.apa.kaf.cli.con.int.ConsumerCoordinator] (smallrye-kafka-consumer-thread-0) [Consumer clientId=qa-qa-cf-executor-transform, groupId=qa-qa-cf-executor-transform] With the COOPERATIVE protocol, owned partitions cannot be reassigned to other members; however the assignor has reassigned partitions [qa-qa-cf-events-32, qa-qa-cf-events-13, qa-qa-cf-events-30, qa-qa-cf-events-38, qa-qa-cf-events-11] which are still owned by some members 2021-10-20 10:14:30.566 ERROR {spanId=, traceId=} [org.apa.kaf.cli.con.int.ConsumerCoordinator] (smallrye-kafka-consumer-thread-0) [Consumer clientId=qa-qa-cf-executor-transform, groupId=qa-qa-cf-executor-transform] With the COOPERATIVE protocol, owned partitions cannot be reassigned to other members; however the assignor has reassigned partitions [qa-qa-cf-events-32, qa-qa-cf-events-13, qa-qa-cf-events-30, qa-qa-cf-events-38, qa-qa-cf-events-11] which are still owned by some members 2021-10-20 10:14:34.913 ERROR {spanId=, traceId=} [org.apa.kaf.cli.con.int.ConsumerCoordinator] (smallrye-kafka-consumer-thread-0) [Consumer clientId=qa-qa-cf-executor-transform, groupId=qa-qa-cf-executor-transform] With the COOPERATIVE protocol, owned partitions cannot be reassigned to other members; however the assignor has reassigned partitions [qa-qa-cf-events-32, qa-qa-cf-events-13, qa-qa-cf-events-30, qa-qa-cf-events-38, qa-qa-cf-events-11] which are still owned by some members 2021-10-20 10:14:34.920 ERROR {spanId=, traceId=} [io.sma.rea.mes.kafka] (smallrye-kafka-consumer-thread-0) SRMSG18217: Unable to read a record from Kafka topics '[qa-qa-cf-events]': java.lang.IllegalStateException: Retries exhausted: 3/3 2021-10-20T13:14:34.928+03:00 Caused by: java.lang.IllegalStateException: Assignor supporting the COOPERATIVE protocol violates its requirements 2021-10-20T13:14:34.928+03:00 at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.validateCooperativeAssignment(ConsumerCoordinator.java:668) 2021-10-20T13:14:34.928+03:00 at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.performAssignment(ConsumerCoordinator.java:592) 2021-10-20T13:14:34.928+03:00 at org.apache.kafka.clients.consumer.internals.AbstractCoordinator.onJoinLeader(AbstractCoordinator.java:693) 2021-10-20T13:14:34.928+03:00 at org.apache.kafka.clients.consumer.internals.AbstractCoordinator.access$1000(AbstractCoordinator.java:111) 2021-10-20T13:14:34.928+03:00 at org.apache.kafka.clients.consumer.internals.AbstractCoordinator$JoinGroupResponseHandler.handle(AbstractCoordinator.java:599) 2021-10-20T13:14:34.928+03:00 at org.apache.kafka.clients.consumer.internals.AbstractCoordinator$JoinGroupResponseHandler.handle(AbstractCoordinator.java:562) 2021-10-20T13:14:34.928+03:00 at org.apache.kafka.clients.consumer.internals.AbstractCoordinator$CoordinatorResponseHandler.onSuccess(AbstractCoordinator.java:1182) 2021-10-20T13:14:34.928+03:00 at org.apache.kafka.clients.consumer.internals.AbstractCoordinator$CoordinatorResponseHandler.onSuccess(AbstractCoordinator.java:1157) 2021-10-20T13:14:34.928+03:00 at org.apache.kafka.clients.consumer.internals.RequestFuture$1.onSuccess(RequestFuture.java:206) 2021-10-20T13:14:34.928+03:00 at org.apache.kafka.clients.consumer.internals.RequestFuture.fireSuccess(RequestFuture.java:169) 2021-10-20T13:14:34.928+03:00 at org.apache.kafka.clients.consumer.internals.RequestFuture.complete(RequestFuture.java:129) 2021-10-20T13:14:34.928+03:00 at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient$RequestFutureCompletionHandler.fireCompletion(ConsumerNetworkClient.java:602) 2021-10-20T13:14:34.928+03:00 at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.firePendingCompletedRequests(ConsumerNetworkClient.java:412) 2021-10-20T13:14:34.928+03:00 at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:247) 2021-10-20T13:14:34.928+03:00 at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:236) 2021-10-20T13:14:34.928+03:00 at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:215) 2021-10-20T13:14:34.928+03:00 at org.apache.kafka.clients.consumer.internals.AbstractCoordinator.joinGroupIfNeeded(AbstractCoordinator.java:426) 2021-10-20T13:14:34.928+03:00 at org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureActiveGroup(AbstractCoordinator.java:365) 2021-10-20T13:14:34.928+03:00 at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.poll(ConsumerCoordinator.java:508) 2021-10-20T13:14:34.928+03:00 at org.apache.kafka.clients.consumer.KafkaConsumer.updateAssignmentMetadataIfNeeded(KafkaConsumer.java:1261) 2021-10-20T13:14:34.928+03:00 at org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1230) 2021-10-20T13:14:34.928+03:00 at org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1210) 2021-10-20T13:14:34.928+03:00 at io.smallrye.reactive.messaging.kafka.impl.ReactiveKafkaConsumer.lambda$poll$4(ReactiveKafkaConsumer.java:131) 2021-10-20T13:14:34.928+03:00 at io.smallrye.reactive.messaging.kafka.impl.ReactiveKafkaConsumer.lambda$runOnPollingThread$0(ReactiveKafkaConsumer.java:101) 2021-10-20T13:14:34.928+03:00 at io.smallrye.context.impl.wrappers.SlowContextualSupplier.get(SlowContextualSupplier.java:21) 2021-10-20T13:14:34.928+03:00 at io.smallrye.mutiny.operators.uni.builders.UniCreateFromItemSupplier.subscribe(UniCreateFromItemSupplier.java:28) 2021-10-20 10:14:34.921 WARN {spanId=, traceId=} [io.sma.rea.mes.kafka] (smallrye-kafka-consumer-thread-0) SRMSG18228: A failure has been reported for Kafka topics '[qa-qa-cf-events]': java.lang.IllegalStateException: Retries exhausted: 3/3 {code} > Cooperative sticky assignor can get stuck with invalid SubscriptionState > input metadata > --------------------------------------------------------------------------------------- > > Key: KAFKA-12984 > URL: https://issues.apache.org/jira/browse/KAFKA-12984 > Project: Kafka > Issue Type: Bug > Components: consumer > Reporter: A. Sophie Blee-Goldman > Assignee: A. Sophie Blee-Goldman > Priority: Blocker > Fix For: 2.8.1, 3.0.0 > > Attachments: image-2021-10-25-11-53-40-221.png > > > Some users have reported seeing their consumer group become stuck in the > CompletingRebalance phase when using the cooperative-sticky assignor. Based > on the request metadata we were able to deduce that multiple consumers were > reporting the same partition(s) in their "ownedPartitions" field of the > consumer protocol. Since this is an invalid state, the input causes the > cooperative-sticky assignor to detect that something is wrong and throw an > IllegalStateException. If the consumer application is set up to simply retry, > this will cause the group to appear to hang in the rebalance state. > The "ownedPartitions" field is encoded based on the ConsumerCoordinator's > SubscriptionState, which was assumed to always be up to date. However there > may be cases where the consumer has dropped out of the group but fails to > clear the SubscriptionState, allowing it to report some partitions as owned > that have since been reassigned to another member. > We should (a) fix the sticky assignment algorithm to resolve cases of > improper input conditions by invalidating the "ownedPartitions" in cases of > double ownership, and (b) shore up the ConsumerCoordinator logic to better > handle rejoining the group and keeping its internal state consistent. See > KAFKA-12983 for more details on (b) -- This message was sent by Atlassian Jira (v8.3.4#803005)