ruanwenjun opened a new issue, #16825:
URL: https://github.com/apache/dolphinscheduler/issues/16825

   ### Search before asking
   
   - [X] I had searched in the 
[DSIP](https://github.com/apache/dolphinscheduler/issues/14102) and found no 
similar DSIP.
   
   
   ### Motivation
   
   When the master/worker disconnect from registry, then it might reconnect 
latter.
   e.g. We use curator to connect to zk, if the session timeout is 120s, the 
server will go into suspend if the heartbeat is failure in 80s, and then it 
will reconnect to another zk node, if reconnect success, then the server 
continue work. But sometimes, other server might receive a disconnect event of 
the reconnect server in this case.
   
   We need to make sure if someone has failover a node, then the node must go 
died.
   
   ### Design Detail
   
   We import a FAILOVER_FINISH_NODES in registry, each server use 
address+server startup time as it's identify, once a server has been 
failovered, then it will be put under `FAILOVER_FINISH_NODES`, so if someone 
find it is under FAILOVER_FINISH_NODES then it should go died.
   
   ### Compatibility, Deprecation, and Migration Plan
   
   _No response_
   
   ### Test Plan
   
   _No response_
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of 
Conduct](https://www.apache.org/foundation/policies/conduct)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: 
[email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to