Hell All We are trying to deploy VMs as cloudstack management nodes over the compute nodes, like hyper-converged infrastructure. But we found some issue with the HA switch. The test is based on CentOS 7.6 and CloudStack 4.11.2. The primary storage is NFS. We have enabled VM HA and Host HA, and the following global settings are enabled for CloudStack: indirect.agent.lb.algorithm=roundrobin host=M1IP,M2IP,M3IP
We have 3 management VMs over 3 compute nodes, sharing with a mariadb galera cluster. M1 is the first management VM running on compute node H1, which is created under KVM and not managed by CloudStack. As the same, M2 and M3 is the second and thrid management VM running on H2 and H3. The problem is: If the CloudStack agent of compute node is connected to the management node on itself (like CloudStack agent of H1 is connected to M1), once H1 is down, CloudStack will only update the status of H1 to Disconnected, and won’t trigger AgentStatusCheck, then cause host HA failure. If H1 is connected to other management VMs like M2 or M3, this issue won’t happen. So, would you please give us some suggestions? Thanks.