caishunfeng opened a new issue #6771: URL: https://github.com/apache/dolphinscheduler/issues/6771
### Search before asking - [X] I had searched in the [issues](https://github.com/apache/dolphinscheduler/issues?q=is%3Aissue) and found no similar issues. ### What happened branch: 2.0 When worker down and start again, a lot of process instance and task instance is not run by fault tolerance, but still keep running state. And I found that when Master run failoverWorker, it's interrupted. ``` grep failover logs/dolphinscheduler-master.2021-11-10_14.* [root@ds1 apache-dolphinscheduler-2.0.1-alpha-SNAPSHOT-bin]# grep failover logs/dolphinscheduler-master.2021-11-10_14.* logs/dolphinscheduler-master.2021-11-10_14.0.log:[INFO] 2021-11-10 14:33:47.887 org.apache.dolphinscheduler.server.master.registry.MasterRegistryClient:[333] - start master failover ... logs/dolphinscheduler-master.2021-11-10_14.0.log:[INFO] 2021-11-10 14:33:47.948 org.apache.dolphinscheduler.server.master.registry.MasterRegistryClient:[337] - failover process list size:0 logs/dolphinscheduler-master.2021-11-10_14.0.log:[INFO] 2021-11-10 14:33:47.949 org.apache.dolphinscheduler.server.master.registry.MasterRegistryClient:[281] - start worker[null] failover ... logs/dolphinscheduler-master.2021-11-10_14.0.log:[INFO] 2021-11-10 14:33:47.955 org.apache.dolphinscheduler.server.master.registry.MasterRegistryClient:[324] - end worker[null] failover ... logs/dolphinscheduler-master.2021-11-10_14.0.log:[INFO] 2021-11-10 14:33:47.955 org.apache.dolphinscheduler.server.master.registry.MasterRegistryClient:[348] - master failover end logs/dolphinscheduler-master.2021-11-10_14.0.log:[INFO] 2021-11-10 14:33:47.958 org.apache.dolphinscheduler.server.master.registry.MasterRegistryClient:[281] - start worker[null] failover ... logs/dolphinscheduler-master.2021-11-10_14.0.log:[INFO] 2021-11-10 14:33:47.960 org.apache.dolphinscheduler.server.master.registry.MasterRegistryClient:[324] - end worker[null] failover ... logs/dolphinscheduler-master.2021-11-10_14.0.log:[INFO] 2021-11-10 14:49:06.004 org.apache.dolphinscheduler.server.master.registry.MasterRegistryClient:[281] - start worker[172.28.230.24:1234] failover ... logs/dolphinscheduler-master.2021-11-10_14.0.log: at org.apache.dolphinscheduler.server.master.registry.MasterRegistryClient.failoverWorker(MasterRegistryClient.java:307) logs/dolphinscheduler-master.2021-11-10_14.0.log: at org.apache.dolphinscheduler.server.master.registry.MasterRegistryClient.failoverServerWhenDown(MasterRegistryClient.java:196) logs/dolphinscheduler-master.2021-11-10_14.0.log: at org.apache.dolphinscheduler.server.master.registry.MasterRegistryClient.failoverWorker(MasterRegistryClient.java:307) logs/dolphinscheduler-master.2021-11-10_14.0.log: at org.apache.dolphinscheduler.server.master.registry.MasterRegistryClient.failoverServerWhenDown(MasterRegistryClient.java:196) logs/dolphinscheduler-master.2021-11-10_14.0.log: at org.apache.dolphinscheduler.server.master.registry.MasterRegistryClient.failoverWorker(MasterRegistryClient.java:307) logs/dolphinscheduler-master.2021-11-10_14.0.log: at org.apache.dolphinscheduler.server.master.registry.MasterRegistryClient.failoverServerWhenDown(MasterRegistryClient.java:196) logs/dolphinscheduler-master.2021-11-10_14.0.log: at org.apache.dolphinscheduler.server.master.registry.MasterRegistryClient.failoverWorker(MasterRegistryClient.java:307) logs/dolphinscheduler-master.2021-11-10_14.0.log: at org.apache.dolphinscheduler.server.master.registry.MasterRegistryClient.failoverServerWhenDown(MasterRegistryClient.java:196) logs/dolphinscheduler-master.2021-11-10_14.0.log: at org.apache.dolphinscheduler.server.master.registry.MasterRegistryClient.failoverWorker(MasterRegistryClient.java:307) logs/dolphinscheduler-master.2021-11-10_14.0.log: at org.apache.dolphinscheduler.server.master.registry.MasterRegistryClient.failoverServerWhenDown(MasterRegistryClient.java:196) logs/dolphinscheduler-master.2021-11-10_14.0.log: at org.apache.dolphinscheduler.server.master.registry.MasterRegistryClient.failoverWorker(MasterRegistryClient.java:307) logs/dolphinscheduler-master.2021-11-10_14.0.log: at org.apache.dolphinscheduler.server.master.registry.MasterRegistryClient.failoverServerWhenDown(MasterRegistryClient.java:196) logs/dolphinscheduler-master.2021-11-10_14.0.log: at org.apache.dolphinscheduler.server.master.registry.MasterRegistryClient.failoverWorker(MasterRegistryClient.java:307) logs/dolphinscheduler-master.2021-11-10_14.0.log: at org.apache.dolphinscheduler.server.master.registry.MasterRegistryClient.failoverServerWhenDown(MasterRegistryClient.java:196) logs/dolphinscheduler-master.2021-11-10_14.0.log: at org.apache.dolphinscheduler.server.master.registry.MasterRegistryClient.failoverWorker(MasterRegistryClient.java:307) logs/dolphinscheduler-master.2021-11-10_14.0.log: at org.apache.dolphinscheduler.server.master.registry.MasterRegistryClient.failoverServerWhenDown(MasterRegistryClient.java:196) logs/dolphinscheduler-master.2021-11-10_14.0.log: at org.apache.dolphinscheduler.server.master.registry.MasterRegistryClient.failoverWorker(MasterRegistryClient.java:307) logs/dolphinscheduler-master.2021-11-10_14.0.log: at org.apache.dolphinscheduler.server.master.registry.MasterRegistryClient.failoverServerWhenDown(MasterRegistryClient.java:196) logs/dolphinscheduler-master.2021-11-10_14.0.log: at org.apache.dolphinscheduler.server.master.registry.MasterRegistryClient.failoverWorker(MasterRegistryClient.java:307) logs/dolphinscheduler-master.2021-11-10_14.0.log: at org.apache.dolphinscheduler.server.master.registry.MasterRegistryClient.failoverServerWhenDown(MasterRegistryClient.java:196) logs/dolphinscheduler-master.2021-11-10_14.0.log: at org.apache.dolphinscheduler.server.master.registry.MasterRegistryClient.failoverWorker(MasterRegistryClient.java:307) logs/dolphinscheduler-master.2021-11-10_14.0.log: at org.apache.dolphinscheduler.server.master.registry.MasterRegistryClient.failoverServerWhenDown(MasterRegistryClient.java:196) logs/dolphinscheduler-master.2021-11-10_14.0.log: at org.apache.dolphinscheduler.server.master.registry.MasterRegistryClient.failoverWorker(MasterRegistryClient.java:307) logs/dolphinscheduler-master.2021-11-10_14.0.log: at org.apache.dolphinscheduler.server.master.registry.MasterRegistryClient.failoverServerWhenDown(MasterRegistryClient.java:196) logs/dolphinscheduler-master.2021-11-10_14.0.log: at org.apache.dolphinscheduler.server.master.registry.MasterRegistryClient.failoverWorker(MasterRegistryClient.java:307) logs/dolphinscheduler-master.2021-11-10_14.0.log: at org.apache.dolphinscheduler.server.master.registry.MasterRegistryClient.failoverServerWhenDown(MasterRegistryClient.java:196) logs/dolphinscheduler-master.2021-11-10_14.0.log: at org.apache.dolphinscheduler.server.master.registry.MasterRegistryClient.failoverWorker(MasterRegistryClient.java:307) logs/dolphinscheduler-master.2021-11-10_14.0.log: at org.apache.dolphinscheduler.server.master.registry.MasterRegistryClient.failoverServerWhenDown(MasterRegistryClient.java:196) logs/dolphinscheduler-master.2021-11-10_14.0.log: at org.apache.dolphinscheduler.server.master.registry.MasterRegistryClient.failoverWorker(MasterRegistryClient.java:307) logs/dolphinscheduler-master.2021-11-10_14.0.log: at org.apache.dolphinscheduler.server.master.registry.MasterRegistryClient.failoverServerWhenDown(MasterRegistryClient.java:196) logs/dolphinscheduler-master.2021-11-10_14.0.log: at org.apache.dolphinscheduler.server.master.registry.MasterRegistryClient.failoverWorker(MasterRegistryClient.java:307) logs/dolphinscheduler-master.2021-11-10_14.0.log: at org.apache.dolphinscheduler.server.master.registry.MasterRegistryClient.failoverServerWhenDown(MasterRegistryClient.java:196) logs/dolphinscheduler-master.2021-11-10_14.0.log: at org.apache.dolphinscheduler.server.master.registry.MasterRegistryClient.failoverWorker(MasterRegistryClient.java:307) logs/dolphinscheduler-master.2021-11-10_14.0.log: at org.apache.dolphinscheduler.server.master.registry.MasterRegistryClient.failoverServerWhenDown(MasterRegistryClient.java:196) logs/dolphinscheduler-master.2021-11-10_14.0.log: at org.apache.dolphinscheduler.server.master.registry.MasterRegistryClient.failoverWorker(MasterRegistryClient.java:307) logs/dolphinscheduler-master.2021-11-10_14.0.log: at org.apache.dolphinscheduler.server.master.registry.MasterRegistryClient.failoverServerWhenDown(MasterRegistryClient.java:196) ``` I found a return if processInstanceExecMaps doesn't contain `processInstance.getId()`, it should be continue. ### What you expected to happen failover normally. ### How to reproduce run a lot of tasks and restart the worker. ### Anything else _No response_ ### Are you willing to submit PR? - [ ] Yes I am willing to submit a PR! ### Code of Conduct - [X] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
