edoardocomar opened a new pull request #11476:
URL: https://github.com/apache/kafka/pull/11476


   Add a call to `onControllerFailover` into code path where `elect` is
   called, and the broker discovers it has already been elected. We
   found that by restarting the ZK leader we could occasionally trigger
   this code path, and prior to this change it would not start a
   controller failover. This left our Kafka cluster in a state where the
   `/controller` znode existed, and named the broker that had "won" the
   controller election, but in terms of runtime state: all the brokers
   had resigned from being the controller. Without a running controller,
   restarting brokers would typically cause partitions to become
   under-replicated as the restarted brokers never received the
   UpdateMetadata or LeaderAndISR requests required to correctly lead /
   follow any of their replicas.
   
   Also add some info level logging and more descriptive log messages for
   the log lines that were helpful in tracking the controller failover.
   
   proposed fix for https://issues.apache.org/jira/browse/KAFKA-13407
   
   Co-authored-by: Tina Selenge <gantigmaa.selen...@uk.ibm.com>
   Co-authored-by: Adrian Preston <prest...@uk.ibm.com>
   Co-authored-by: Edoardo Comar <eco...@euk.ibm.com.com>
   
   ### Committer Checklist (excluded from commit message)
   - [ ] Verify design and implementation 
   - [ ] Verify test coverage and CI build status
   - [ ] Verify documentation (including upgrade notes)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to