caishunfeng opened a new issue #7024: URL: https://github.com/apache/dolphinscheduler/issues/7024
 When master lost zk connection: 1. update current server state to `wait reconnect` 2. send server lost connection alert 3. keep quartz working (it will ensure work normally by quartz and db) 4. stop accepting new request 4. stop handling commands and process instances, clear the local running process instances; (it will be take over by other master) 5. wait to reconnect 6. when reconnect successfully, send server recover alert, update server state to `normal` and recover working  When worker lost zk connection: 1. update current server state to `wait reconnect` 2. send server lost connection alert 2. kill the running task (it will be task over by master and rerun) 3. stop accepting new request 4. wait to reconnect within a certain time 5. if reconnect timeout, stop itself 6. when reconnect successfully, send server recover alert, update server state to `normal` and recover working; _Originally posted by @caishunfeng in https://github.com/apache/dolphinscheduler/discussions/6643#discussioncomment-1706255_ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
