ruanwenjun opened a new issue, #16825: URL: https://github.com/apache/dolphinscheduler/issues/16825
### Search before asking - [X] I had searched in the [DSIP](https://github.com/apache/dolphinscheduler/issues/14102) and found no similar DSIP. ### Motivation When the master/worker disconnect from registry, then it might reconnect latter. e.g. We use curator to connect to zk, if the session timeout is 120s, the server will go into suspend if the heartbeat is failure in 80s, and then it will reconnect to another zk node, if reconnect success, then the server continue work. But sometimes, other server might receive a disconnect event of the reconnect server in this case. We need to make sure if someone has failover a node, then the node must go died. ### Design Detail We import a FAILOVER_FINISH_NODES in registry, each server use address+server startup time as it's identify, once a server has been failovered, then it will be put under `FAILOVER_FINISH_NODES`, so if someone find it is under FAILOVER_FINISH_NODES then it should go died. ### Compatibility, Deprecation, and Migration Plan _No response_ ### Test Plan _No response_ ### Code of Conduct - [X] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
