[ 
https://issues.apache.org/jira/browse/KAFKA-12455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17301956#comment-17301956
 ] 

Ron Dagostino commented on KAFKA-12455:
---------------------------------------

The test is using the default value of metadata.max.age.ms=300000 (5 minutes).  
When I explicitly turn it down to metadata.max.age.ms=5000 (5 seconds) the test 
passes for Raft but then fails for ZK (2 unexpected group rebalances in that 
case).

I increased it to 10 seconds and then the Raft configuration failed with 3 
unexpected rebalances and the ZK configuration failed with 1 unexpected 
rebalance.

I decreased it to a very aggressive 1 second -- and they both passed.

We have historically seen some flakiness in the ZooKeeper version of this test, 
and the fact that the test suddenly failed if we set metadata.max.age.ms to 5 
or 10 seconds indicates that the it is just plain luck that the test is passing 
today.

Given that the current client-side code doesn't fall back to the bootstrap 
brokers when it sees no brokers available, I think any test really needs to 
make it *impossible* for the client to see cluster metadata with just a single 
broker. Decreasing the metadata max age decreases the possibility of it 
happening but doesn't make it impossible.

Another experiment was to keep metadata.max.age.ms=300000 but define 
session.timeout.ms = 30000 instead of the 10000 it was setting before -- this 
is longer tyan the broker roll time, and in fact this change allows both 
configurations to pass.

A further experiment was to keep metadata.max.age.ms=300000 and 
session.timeout.ms = 10000 but expand to 3 brokers instead of just 2.  This 
should fix the issue since there would never be a situation where just 1 broker 
is available, and a METADATA response would always have at least 2 brokers for 
the consumer to use.  Both configurations pass.


> OffsetValidationTest.test_broker_rolling_bounce failing for Raft quorums
> ------------------------------------------------------------------------
>
>                 Key: KAFKA-12455
>                 URL: https://issues.apache.org/jira/browse/KAFKA-12455
>             Project: Kafka
>          Issue Type: Bug
>    Affects Versions: 2.8.0
>            Reporter: Ron Dagostino
>            Assignee: Ron Dagostino
>            Priority: Blocker
>
> OffsetValidationTest.test_broker_rolling_bounce in `consumer_test.py` is 
> failing because the consumer group is rebalancing unexpectedly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to