LB-Yu commented on PR #1401:
URL: https://github.com/apache/fluss/pull/1401#issuecomment-3203943869
Hi @zcoo. Overall, I have some concerns about several issues:
1. The mechanisms for leader election, startup, and resignation are not
sufficiently robust. For reference, Kafka Controller provides a comprehensive
implementation for controller lifecycle management:
```
Startup:
Register ControllerChangeHandler and check whether the /controller path
exists
elect()
Get activeControllerId from ZK
If activeControllerId != -1, it means another broker has already become
the controller, so just return
Attempt to register as controller in ZK:
registerControllerAndIncrementControllerEpoch
If successful: call onControllerFailover()
If failed:
If it’s ControllerMovedException, call maybeResign (election failed)
Otherwise, call triggerControllerMove (startup process failed)
When /controller node is created or its data changes:
Call maybeResign()
When /controller node is deleted:
processReelect
maybeResign
elect
```
2. There is no management of epochs. Combined with the lack of a leader
resignation mechanism, this becomes extremely risky in the case of split-brain
scenarios. There is the possibility that two leaders could be working
simultaneously, leading to metadata inconsistencies.
3. After introducing multiple coordinators, should we re-examine potential
race conditions in metadata management and ZK access to avoid the risks
associated with split-brain situations?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]