GabrielBrascher opened a new pull request #4978:
URL: https://github.com/apache/cloudstack/pull/4978
### Description
Currently, KVM HA implementation works only if the cluster has at least one
primary storage served via NFS. This is due to the NFS heartbeat script used to
check if the host is healthy. This implementation adds health checks that work
regardless of a storage pool. This is done via a Java client that checks Agent
status via a webserver.
The additional web-server exposes a simple JSON API that returns a list of
Virtual Machines that are running on that host according to Libvirt. This way,
KVM HA can verify, via Libvirt, VMs status with HTTP-call to this simple
webserver and determine if the host is actually down or if it is just the Java
Agent which has crashed.
#### New KVM HA Helper component
The following image shows how the new KVM-HA-Helper web-service is
integrated. The current NFS HeartBeat execution flow will still be used aligned
with the new HA-Helper.
<p align="center">
<img width="460" height="300"
src="https://user-images.githubusercontent.com/5025148/122809301-522bbe00-d2a4-11eb-9ebd-548d4f74b5fe.png">
</p>
#### High Availability Workflow
Proposed workflow where the HA Check takes into account both **NFS
Heartbeat** and the **KVM HA Helper** checks.
**Note that** in order to simplify the diagram it is ignored the whole [HA
state machine](https://cwiki.apache.org/confluence/display/CLOUDSTACK/Host+HA).
However, if NFS and HA Helper fails not necessarily it is going to
Recover/Fence a host as depending on the HA configurations it needs to re-check
some times until it reaches a threshold of accepted failures.
<p align="center">
<img width="400" height="500"
src="https://user-images.githubusercontent.com/5025148/122818822-0a129880-d2b0-11eb-8085-226eb900a2f1.png">
</p>
### Types of changes
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [x] New feature (non-breaking change which adds functionality)
- [ ] Bug fix (non-breaking change which fixes an issue)
- [ ] Enhancement (improves an existing feature and functionality)
- [ ] Cleanup (Code refactoring and cleanup, that may add test cases)
### Feature/Enhancement Scale or Bug Severity
#### Feature/Enhancement Scale
- [x] Major
- [ ] Minor
<!-- ### How Has This Been Tested? -->
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]