We probably need to be a bit careful about bumping up the controller epoch at 
the beginning of onControllerFailover(). Currently, the reading and the 
incrementing of the controller epoch is done independently after the controller 
path has been created successfully. This can create the following problem. 
Broker A creates the controller path and is about to call 
onControllerFailover(). Admin deletes the controller path and broker B creates 
the controller path, reads the controller epoch and updates it to 1. Broker A 
reads the controller epoch and updates it to 2. Now broker B is the controller, 
but its controller epoch is outdated.

One way to address this issue is to use multi() when creating the controller 
path. To elect a new controller, a broker first reads the current controller 
epoch from ZK and then do a multi() to (1) write the controller path (2) do a 
conditional update to the controller epoch. Not sure if this is the best way 
though.

[ Full content available at: https://github.com/apache/kafka/pull/5101 ]
This message was relayed via gitbox.apache.org for [email protected]

Reply via email to