[ https://issues.apache.org/jira/browse/KAFKA-12455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17301253#comment-17301253 ]
Ron Dagostino commented on KAFKA-12455: --------------------------------------- With a 2-broker cluster that undergoes a series of 5 rolling restarts (which is what is happening here), in the Raft case, the consumers sometimes receive a `MetadataResponse` that has only a single broker since the other broker is restarting. This never happens in the Zookeeper case -- every received `MetadataResponse` in that case always lists both brokers. I'm not sure why this would be the case in the ZooKeeper configuration, but that is the fundamental difference between the two cases in this test scenario: in the Raft configuration the consumer sometimes sees `METADATA` responses with just a single broker, and in the ZooKeeper scenario this never happens. The problem with the consumer seeing only a single broker in the `METADATA` response for the Raft configuration is that when that broker that it knows about goes down the consumer suddenly has no available brokers that it knows about, and we see messages in the consumer log saying `Give up sending metadata request since no node is available`. It then takes a while before the only broker that the consumer knows about restarts, and by that time the consumer group has already moved to the `GroupCoordinator` on the other broker (the one that the consumer didn't know about), and that coordinator fails the consumer due to a lack of a heartbeat -- thus a rebalance happens, and this test is specifically checking to make sure no rebalances occur during the rolling restarts. > OffsetValidationTest.test_broker_rolling_bounce failing for Raft quorums > ------------------------------------------------------------------------ > > Key: KAFKA-12455 > URL: https://issues.apache.org/jira/browse/KAFKA-12455 > Project: Kafka > Issue Type: Bug > Affects Versions: 2.8.0 > Reporter: Ron Dagostino > Assignee: Ron Dagostino > Priority: Blocker > > OffsetValidationTest.test_broker_rolling_bounce in `consumer_test.py` is > failing because the consumer group is rebalancing unexpectedly. -- This message was sent by Atlassian Jira (v8.3.4#803005)