We probably need to be a bit careful about bumping up the controller epoch at the beginning of onControllerFailover(). Currently, the reading and the incrementing of the controller epoch is done independently after the controller path has been created successfully. This can create the following problem. Broker A creates the controller path and is about to call onControllerFailover(). Admin deletes the controller path and broker B creates the controller path, reads the controller epoch and updates it to 1. Broker A reads the controller epoch and updates it to 2. Now broker B is the controller, but its controller epoch is outdated.
One way to address this issue is to use multi() when creating the controller path. To elect a new controller, a broker first reads the current controller epoch from ZK and then do a multi() to (1) write the controller path (2) do a conditional update to the controller epoch. Not sure if this is the best way though. [ Full content available at: https://github.com/apache/kafka/pull/5101 ] This message was relayed via gitbox.apache.org for [email protected]
