Discussed with Onur offline, the purpose of `getAfterNodeExists` in `CheckedEphemeral` is indeed used to handle the case when zk connection loss happens. After digging around both zookeeper and kafka codes, we think it is safe to remove the extra complexity for `controllerNodeExistsHandler` in this PR when we make `/controller` creation and `/controller_epoch` update atomic.
So the logic will be: 1). Try to create `/controller_epoch` if not exists 2). Read `/controller_epoch` from zk 3). Atomically create `/controller` and update `/controller_epoch` 4). If 3) throws NodeExistsException, read `/controller` and if controller id in zk equals the current broker id and if controller epoch in zk equals the expected epoch, successfully finish controller election; Otherwise, throw ControllerMovedException. [ Full content available at: https://github.com/apache/kafka/pull/5101 ] This message was relayed via gitbox.apache.org for [email protected]
