Also, we didn't have the protection against a stale controller deleting
`\controller` znode when `ControllerMovedException` is thrown in
`onControllerFailOver()` during controller initialization. This can cause
controller switch storm. Consider the following events:
1. broker A creates /controller znode
2. broker A reads the controller epoch zkVersion
3. broker A lose its zk session -> /controller node gets deleted
4. broker B creates /controller znode
5. broker B reads the controller epoch zkVersion
6. broker B increments the controller epoch and bumps the zkVersion of
/controller_epoch znode
7. broker A tries to update zookeeper (e.g. update
`/admin/reassign_partitions`) but fails because /controller_epoch zkVersion
mismatch. A ControllerMovedException is thrown.
8. broker A removes the /controller znode (currently owned by broker B) because
we didn't explicitly catch `ControllerMovedException` in `elect()` and trigger
a controller move in the following code block:
```
try {
...
onControllerFailOver()
} catch {
...
case e2: Throwable =>
error("Error while electing or becoming controller on broker
%d".format(config.brokerId), e2)
triggerControllerMove()
}
```
9. broker C creates /controller znode
...
This loop may never end and the controller role will keep switching across
brokers without making any progress. I have also updated the PR to include the
fixes:
1. Catch `ControllerMovedException` in `elect()` without triggering the
controller move. This makes sense because when the `ControllerMovedException`
is thrown, the new controller has been elected and there is no need to trigger
another round of controller election.
2. Guard against the `\controller` deletion by controller epoch zkVersion. This
ensures stale controller cannot delete `\controller` znode.
[ Full content available at: https://github.com/apache/kafka/pull/5101 ]
This message was relayed via gitbox.apache.org for [email protected]