LB-Yu commented on PR #1401:
URL: https://github.com/apache/fluss/pull/1401#issuecomment-3203943869

   Hi @zcoo. Overall, I have some concerns about several issues:
   
   1. The mechanisms for leader election, startup, and resignation are not 
sufficiently robust. For reference, Kafka Controller provides a comprehensive 
implementation for controller lifecycle management:
   ```
   Startup:
     Register ControllerChangeHandler and check whether the /controller path 
exists
     elect()
       Get activeControllerId from ZK
       If activeControllerId != -1, it means another broker has already become 
the controller, so just return
       Attempt to register as controller in ZK: 
registerControllerAndIncrementControllerEpoch
       If successful: call onControllerFailover()
       If failed:
         If it’s ControllerMovedException, call maybeResign (election failed)
         Otherwise, call triggerControllerMove (startup process failed)
   
   When /controller node is created or its data changes:
     Call maybeResign()
   
   When /controller node is deleted:
     processReelect
       maybeResign
       elect
   ```
   
   2. There is no management of epochs. Combined with the lack of a leader 
resignation mechanism, this becomes extremely risky in the case of split-brain 
scenarios. There is the possibility that two leaders could be working 
simultaneously, leading to metadata inconsistencies.
   
   3. After introducing multiple coordinators, should we re-examine potential 
race conditions in metadata management and ZK access to avoid the risks 
associated with split-brain situations?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to