ruanwenjun opened a new issue #6045: URL: https://github.com/apache/dolphinscheduler/issues/6045
**Describe the bug** When a master/worker session is lost from zookeeper, the Ephemeral node will be removed by zookeeper. And the failover node will do the failover job, it will create a node in zookeeper dead-server path. The `HeartBeatTask` of the dead server will check it has been in the dead-server, and it will kill itself. When it stops, it will unregister from the registry(zookeeper). https://github.com/apache/dolphinscheduler/blob/04720b327aef0649e9317573680874c20ea20ad5/dolphinscheduler-server/src/main/java/org/apache/dolphinscheduler/server/master/registry/MasterRegistryClient.java#L371-L379 https://github.com/apache/dolphinscheduler/blob/04720b327aef0649e9317573680874c20ea20ad5/dolphinscheduler-registry-plugin/dolphinscheduler-registry-zookeeper/src/main/java/org/apache/dolphinscheduler/plugin/registry/zookeeper/ZookeeperRegistry.java#L197-L204 The problem is that the dead server's Ephemeral node is already removed, when we use `client.delete()` it will throw an KeeperException.NoNodeException. **To Reproduce** 1. Start a master server 2. delete the Ephemeral node to simulate the master is dead. 3. see exception, and the master cannot shut down, since some other thread cannot exist. **Which version of Dolphin Scheduler:** -[dev] **Requirement or improvement** Ignore the KeeperException.NoNodeException, if the node is not exist, we can think the node has been deleted. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
