Calvin Liu created KAFKA-18966:
----------------------------------

             Summary: Don't honor controller_num_nodes_override in combined 
controller test mode
                 Key: KAFKA-18966
                 URL: https://issues.apache.org/jira/browse/KAFKA-18966
             Project: Kafka
          Issue Type: Bug
            Reporter: Calvin Liu
            Assignee: Calvin Liu


I found some flaky tests caused by the following test setup:
 # Using combined controller mode which means the broker will also host the 
controller.
 # Using 1 controller node. This is very common among the tests.
 # Testing hard bounce.

When the broker which hosts the controller is down, the whole controller 
service is down as well. It can take a long time to elect a new leader even if 
ISR has good candidates. This downtime costs unnecessary extra test time(due to 
unavailable partition) and pushes some timeout (like transaction timeout) to be 
longer.

Propose to set the controller node to at least 3 in the combined controller 
test mode to 
 # Avoid the flaky factor of no valid leader during the broker restart.
 # Reduce the test time.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to