spdinis commented on issue #7543: URL: https://github.com/apache/cloudstack/issues/7543#issuecomment-2077714357
Hi, We are preparing a transition from vmware to kvm and we are struggling to get HA to work with the same symptoms. We are using Cloudstack 4.19.0 we have few test clusters that have nfs mount for heartbeat and we will be using shared mount. for the case we tried with simple NFS and made no difference. We have the out of band enabled and when we power off the physical host via iDRAC, the host moves to fencing after a while and stays in that status and all VMs that were running on it, keep saying running. Once we declare manually that the host is degraded, VMs jump straight to another host. One of the surviving nodes agent logs shows it detects that the host is down as slavkap showed: 2024-04-25 15:34:11,406 WARN [kvm.resource.KVMHAChecker] (pool-636-thread-1:null) (logid:29cd972b) All checks with KVMHAChecker for host IP [10.250.9.154] in pools [e355eb3b-58eb-3ce2-890a-6a7b7263d896, b7f5ff6f-7233-3d79-aa1a-1c1bc233e1c8] considered it as dead. It may cause a shutdown of the host. So I presume the issue is related to the transition to another state after fencing. We will perform some additional tests using redfish for example, or by try to force some non power off failure, see if it is an issue with the agent detecting that the IPMI is actually off assuming it was a voluntary action. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
