Also, we didn't have the protection against a stale controller deleting 
`\controller` znode when `ControllerMovedException` is thrown in 
`onControllerFailOver()` during controller initialization. This can cause 
controller switch storm. Consider the following events:
1. broker A creates /controller znode
2. broker A reads the controller epoch zkVersion
3. broker A lose its zk session -> /controller node gets deleted
4. broker B creates /controller znode
5. broker B reads the controller epoch zkVersion
6. broker B increments the controller epoch and bumps the zkVersion of 
/controller_epoch znode
7. broker A tries to update zookeeper (e.g. update 
`/admin/reassign_partitions`) but fails because /controller_epoch zkVersion 
mismatch. A ControllerMovedException is thrown.
8. broker A removes the /controller znode (currently owned by broker B) because 
we didn't explicitly catch `ControllerMovedException` in `elect()` and trigger 
a controller move in the following code block:
```
try {
  ...
  onControllerFailOver()
} catch {
  ...
  case e2: Throwable =>
        error("Error while electing or becoming controller on broker 
%d".format(config.brokerId), e2)
        triggerControllerMove()
}
```
9. broker C creates /controller znode
...

This loop may never end and the controller role will keep switching across 
brokers without making any progress. I have also updated the PR to include the 
fixes:
1. Catch `ControllerMovedException` in `elect()` without triggering the 
controller move. This makes sense because when the `ControllerMovedException` 
is thrown, the new controller has been elected and there is no need to trigger 
another round of controller election.
2. Guard against the `\controller` deletion by controller epoch zkVersion. This 
ensures stale controller cannot delete `\controller` znode.

[ Full content available at: https://github.com/apache/kafka/pull/5101 ]
This message was relayed via gitbox.apache.org for [email protected]

Reply via email to