Hi, all

The current Ratis module's Leader may step down voluntarily without knowing who 
the new Leader is, which will not trigger the state machine's 
notifyLeaderChange callback. As a result, some modules that rely on this 
interface to determine whether the current node is no longer the Leader might 
delay resource release, potentially causing split-brain issues with multiple 
Leaders.

For example, in a 3-node ConfigNode setup, if a symmetric network partition 
fault is injected into the Leader node, the other two nodes will elect a new 
Leader. However, certain services (such as heartbeat, procedure, etc.) on the 
old Leader will not be cleared, leading to a split-brain scenario, which could 
cause some unexpected behavior.

I have submit a PR[1], after this PR, even if the new Leader is unknown, Ratis 
will still call the notifyNotReady function, thereby preventing split-brain 
issues from occurring. This work helps improve the stability of all Service on 
CN and Pipe metadata synchronization.

Please feel free to review it.

[1] https://github.com/apache/iotdb/pull/13221

Best
---------------
Xinyu Tan

Reply via email to