github-actions[bot] commented on issue #13315: URL: https://github.com/apache/dolphinscheduler/issues/13315#issuecomment-1369345361
### Search before asking - [X] I had searched in the [issues](https://github.com/apache/dolphinscheduler/issues?q=is%3Aissue) and found no similar issues. ### What happened 版本3.0.0,服务正常运行,master突然down了一个节点,这是日志: [ERROR] 2022-12-31 05:20:45.000 +0800 org.apache.dolphinscheduler.server.master.registry.ServerNodeManager:[324] - [WorkflowInstance-0][TaskInstance-0] - update master nodes error org.apache.dolphinscheduler.registry.api.RegistryException: zookeeper release lock error at org.apache.dolphinscheduler.plugin.registry.zookeeper.ZookeeperRegistry.acquireLock(ZookeeperRegistry.java:215) at org.apache.dolphinscheduler.service.registry.RegistryClient.getLock(RegistryClient.java:231) at org.apache.dolphinscheduler.server.master.registry.ServerNodeManager.updateMasterNodes(ServerNodeManager.java:319) at org.apache.dolphinscheduler.server.master.registry.ServerNodeManager.access$800(ServerNodeManager.java:68) at org.apache.dolphinscheduler.server.master.registry.ServerNodeManager$MasterDataListener.notify(ServerNodeManager.java:303) at org.apache.dolphinscheduler.plugin.registry.zookeeper.ZookeeperRegistry.lambda$subscribe$1(ZookeeperRegistry.java:128) at org.apache.curator.framework.recipes.cache.TreeCache$2.apply(TreeCache.java:760) at org.apache.curator.framework.recipes.cache.TreeCache$2.apply(TreeCache.java:754) at org.apache.curator.framework.listen.ListenerContainer$1.run(ListenerContainer.java:100) at org.apache.curator.shaded.com.google.common.util.concurrent.DirectExecutor.execute(DirectExecutor.java:30) at org.apache.curator.framework.listen.ListenerContainer.forEach(ListenerContainer.java:92) at org.apache.curator.framework.recipes.cache.TreeCache.callListeners(TreeCache.java:753) at org.apache.curator.framework.recipes.cache.TreeCache.access$1900(TreeCache.java:75) at org.apache.curator.framework.recipes.cache.TreeCache$4.run(TreeCache.java:865) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: java.io.IOException: Lost connection while trying to acquire lock: /lock/masters at org.apache.curator.framework.recipes.locks.InterProcessMutex.acquire(InterProcessMutex.java:91) at org.apache.dolphinscheduler.plugin.registry.zookeeper.ZookeeperRegistry.acquireLock(ZookeeperRegistry.java:204) ... 18 common frames omitted [ERROR] 2022-12-31 05:20:45.000 +0800 org.apache.dolphinscheduler.server.master.registry.MasterRegistryClient:[150] - [WorkflowInstance-0][TaskInstance-0] - MASTER server failover failed, host:192.168.142.20:5678 org.apache.dolphinscheduler.registry.api.RegistryException: Failed to put registry key: /dead-servers/master_192.168.142.20:5678 at org.apache.dolphinscheduler.plugin.registry.zookeeper.ZookeeperRegistry.put(ZookeeperRegistry.java:172) at org.apache.dolphinscheduler.service.registry.RegistryClient.lambda$handleDeadServer$1(RegistryClient.java:159) at java.util.Collections$SingletonSet.forEach(Collections.java:4767) at org.apache.dolphinscheduler.service.registry.RegistryClient.handleDeadServer(RegistryClient.java:150) at org.apache.dolphinscheduler.server.master.registry.MasterRegistryClient.removeMasterNodePath(MasterRegistryClient.java:142) at org.apache.dolphinscheduler.server.master.registry.MasterRegistryDataListener.handleMasterEvent(MasterRegistryDataListener.java:66) at org.apache.dolphinscheduler.server.master.registry.MasterRegistryDataListener.notify(MasterRegistryDataListener.java:52) at org.apache.dolphinscheduler.plugin.registry.zookeeper.ZookeeperRegistry.lambda$subscribe$1(ZookeeperRegistry.java:128) at org.apache.curator.framework.recipes.cache.TreeCache$2.apply(TreeCache.java:760) at org.apache.curator.framework.recipes.cache.TreeCache$2.apply(TreeCache.java:754) at org.apache.curator.framework.listen.ListenerContainer$1.run(ListenerContainer.java:100) at org.apache.curator.shaded.com.google.common.util.concurrent.DirectExecutor.execute(DirectExecutor.java:30) at org.apache.curator.framework.listen.ListenerContainer.forEach(ListenerContainer.java:92) at org.apache.curator.framework.recipes.cache.TreeCache.callListeners(TreeCache.java:753) at org.apache.curator.framework.recipes.cache.TreeCache.access$1900(TreeCache.java:75) at org.apache.curator.framework.recipes.cache.TreeCache$4.run(TreeCache.java:865) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: java.lang.IllegalStateException: Expected state [STARTED] was [STOPPED] at org.apache.curator.shaded.com.google.common.base.Preconditions.checkState(Preconditions.java:823) at org.apache.curator.framework.imps.CuratorFrameworkImpl.checkState(CuratorFrameworkImpl.java:432) at org.apache.curator.framework.imps.CuratorFrameworkImpl.create(CuratorFrameworkImpl.java:445) at org.apache.dolphinscheduler.plugin.registry.zookeeper.ZookeeperRegistry.put(ZookeeperRegistry.java:166) ... 20 common frames omitted [ERROR] 2022-12-31 05:20:45.000 +0800 org.apache.dolphinscheduler.server.master.registry.ServerNodeManager:[307] - [WorkflowInstance-0][TaskInstance-0] - MasterNodeListener capture data change and get data failed. java.lang.NullPointerException: null at org.apache.dolphinscheduler.plugin.registry.zookeeper.ZookeeperRegistry.releaseLock(ZookeeperRegistry.java:222) at org.apache.dolphinscheduler.service.registry.RegistryClient.releaseLock(RegistryClient.java:235) at org.apache.dolphinscheduler.server.master.registry.ServerNodeManager.updateMasterNodes(ServerNodeManager.java:326) at org.apache.dolphinscheduler.server.master.registry.ServerNodeManager.access$800(ServerNodeManager.java:68) at org.apache.dolphinscheduler.server.master.registry.ServerNodeManager$MasterDataListener.notify(ServerNodeManager.java:303) at org.apache.dolphinscheduler.plugin.registry.zookeeper.ZookeeperRegistry.lambda$subscribe$1(ZookeeperRegistry.java:128) at org.apache.curator.framework.recipes.cache.TreeCache$2.apply(TreeCache.java:760) at org.apache.curator.framework.recipes.cache.TreeCache$2.apply(TreeCache.java:754) at org.apache.curator.framework.listen.ListenerContainer$1.run(ListenerContainer.java:100) at org.apache.curator.shaded.com.google.common.util.concurrent.DirectExecutor.execute(DirectExecutor.java:30) at org.apache.curator.framework.listen.ListenerContainer.forEach(ListenerContainer.java:92) at org.apache.curator.framework.recipes.cache.TreeCache.callListeners(TreeCache.java:753) at org.apache.curator.framework.recipes.cache.TreeCache.access$1900(TreeCache.java:75) at org.apache.curator.framework.recipes.cache.TreeCache$4.run(TreeCache.java:865) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) ### What you expected to happen master and worker works fine ### How to reproduce please refer to the log ### Anything else _No response_ ### Version 3.0.x ### Are you willing to submit PR? - [X] Yes I am willing to submit a PR! ### Code of Conduct - [X] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
