Calvin Liu created KAFKA-18966: ---------------------------------- Summary: Don't honor controller_num_nodes_override in combined controller test mode Key: KAFKA-18966 URL: https://issues.apache.org/jira/browse/KAFKA-18966 Project: Kafka Issue Type: Bug Reporter: Calvin Liu Assignee: Calvin Liu
I found some flaky tests caused by the following test setup: # Using combined controller mode which means the broker will also host the controller. # Using 1 controller node. This is very common among the tests. # Testing hard bounce. When the broker which hosts the controller is down, the whole controller service is down as well. It can take a long time to elect a new leader even if ISR has good candidates. This downtime costs unnecessary extra test time(due to unavailable partition) and pushes some timeout (like transaction timeout) to be longer. Propose to set the controller node to at least 3 in the combined controller test mode to # Avoid the flaky factor of no valid leader during the broker restart. # Reduce the test time. -- This message was sent by Atlassian Jira (v8.20.10#820010)