EricGao888 opened a new issue, #13355: URL: https://github.com/apache/dolphinscheduler/issues/13355
### Search before asking - [X] I had searched in the [issues](https://github.com/apache/dolphinscheduler/issues?q=is%3Aissue) and found no similar issues. ### What happened * When master performs failover, it would set the host of process instances which need failover to `NULL`. If master crashes after this step but the failover of process instances hasn't ended yet, those process instances will never get failover again. https://github.com/apache/dolphinscheduler/blob/2a44a7f36a4be85221703bc3d8d70a650860b31e/dolphinscheduler-master/src/main/java/org/apache/dolphinscheduler/server/master/service/MasterFailoverService.java#L197-L202 https://github.com/apache/dolphinscheduler/blob/2a44a7f36a4be85221703bc3d8d70a650860b31e/dolphinscheduler-master/src/main/java/org/apache/dolphinscheduler/server/master/service/MasterFailoverService.java#L291-L298 ### What you expected to happen * Failover should succeed eventually, even if master crashes during failover. ### How to reproduce 1. Kill workers. 2. Kill masters. 3. Start masters, now masters will perform failover, but there are current no workers, so failover will not succeed. 4. Restart masters and start workers, masters will perform failover again. However, in step 3, the hosts of process instances need failover are set to NULL, they will never really get failover successfully. ### Anything else _No response_ ### Version dev ### Are you willing to submit PR? - [X] Yes I am willing to submit a PR! ### Code of Conduct - [X] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
