[GitHub] [dolphinscheduler] EricGao888 opened a new issue, #13355: [Bug] [Master] Failover may never succeed if master crashes during failover


EricGao888 opened a new issue, #13355:
URL: https://github.com/apache/dolphinscheduler/issues/13355

### Search before asking

- [X] I had searched in the
[issues](https://github.com/apache/dolphinscheduler/issues?q=is%3Aissue) and
found no similar issues.

### What happened

* When master performs failover, it would set the host of process instances
which need failover to `NULL`. If master crashes after this step but the
failover of process instances hasn't ended yet, those process instances will
never get failover again.

https://github.com/apache/dolphinscheduler/blob/2a44a7f36a4be85221703bc3d8d70a650860b31e/dolphinscheduler-master/src/main/java/org/apache/dolphinscheduler/server/master/service/MasterFailoverService.java#L197-L202

### What you expected to happen

* Failover should succeed eventually, even if master crashes during failover.

### How to reproduce

1. Kill workers.
2. Kill masters.
3. Start masters, now masters will perform failover, but there are current
no workers, so failover will not succeed.
4. Restart masters and start workers, masters will perform failover again.
However, in step 3, the hosts of process instances need failover are set to
NULL, they will never really get failover successfully.

### Anything else

_No response_

### Version

dev

### Are you willing to submit PR?

- [X] Yes I am willing to submit a PR!

### Code of Conduct

- [X] I agree to follow this project's [Code of
Conduct](https://www.apache.org/foundation/policies/conduct)

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail:
[email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to