[ https://issues.apache.org/jira/browse/CLOUDSTACK-3535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13747484#comment-13747484 ]
Paul Angus commented on CLOUDSTACK-3535: ---------------------------------------- I've tested the HA functionality on KVM and found that it did not work. CloudStack ssems unable to 'stop' the VM which was on a host that failed because the host is unavailable. I waited an hour and the instance remained in the state 'stopping'. I then restarted the host and the instance stopped, but 5 hours later it hasn't restarted. 2013-08-22 08:35:09,802 INFO [cloud.ha.HighAvailabilityManagerImpl] (HA-Worker-0:work-3) KVMInvestigator found VM[User|HA-Test1]to be alive? null 2013-08-22 08:35:09,802 DEBUG [cloud.ha.HighAvailabilityManagerImpl] (HA-Worker-0:work-3) Fencing off VM that we don't know the state of 2013-08-22 08:35:09,802 DEBUG [cloud.ha.XenServerFencer] (HA-Worker-0:work-3) Don't know how to fence non XenServer hosts KVM 2013-08-22 08:35:09,803 INFO [cloud.ha.HighAvailabilityManagerImpl] (HA-Worker-0:work-3) Fencer null returned null 2013-08-22 08:35:09,807 DEBUG [agent.transport.Request] (HA-Worker-0:work-3) Seq 2-1715210012: Sending { Cmd , MgmtId: 345049337494, via: 2, Ver: v1, Flags: 100011, [{"com.cloud.agent.api.FenceCommand":{"vmName":"i-2-42-VM","hostGuid":"fdf1e936-0373-389b-abef-a68e339ff910-LibvirtComputingResource","hostIp":"10.0.100.41","inSeq":false,"wait":0}}] } 2013-08-22 08:35:09,905 DEBUG [agent.transport.Request] (AgentManager-Handler-13:null) Seq 2-1715210012: Processing: { Ans: , MgmtId: 345049337494, via: 2, Ver: v1, Flags: 10, [{"com.cloud.agent.api.FenceAnswer":{"result":true,"wait":0}}] } 2013-08-22 08:35:09,905 DEBUG [agent.transport.Request] (HA-Worker-0:work-3) Seq 2-1715210012: Received: { Ans: , MgmtId: 345049337494, via: 2, Ver: v1, Flags: 10, { FenceAnswer } } 2013-08-22 08:35:09,905 INFO [cloud.ha.HighAvailabilityManagerImpl] (HA-Worker-0:work-3) Fencer KVMFenceBuilder returned true 2013-08-22 08:35:09,911 DEBUG [cloud.capacity.CapacityManagerImpl] (HA-Worker-0:work-3) VM state transitted from :Running to Stopping with event: StopRequestedvm's original host id: 5 new host id: 5 host id before state transition: 5 2013-08-22 08:35:09,916 WARN [cloud.vm.VirtualMachineManagerImpl] (HA-Worker-0:work-3) Unable to stop vm, agent unavailable: com.cloud.exception.AgentUnavailableException: Resource [Host:5] is unreachable: Host 5: Host with specified id is not in the right state: Down 2013-08-22 08:35:09,916 WARN [cloud.vm.VirtualMachineManagerImpl] (HA-Worker-0:work-3) Unable to actually stop VM[User|HA-Test1] but continue with release because it's a force stop 2013-08-22 08:35:09,920 ERROR [cloud.ha.HighAvailabilityManagerImpl] (HA-Worker-0:work-3) Terminating HAWork[3-HA-42-Running-Investigating] com.cloud.utils.exception.CloudRuntimeException: Caught exception even though it should be handled. at com.cloud.ha.HighAvailabilityManagerImpl.restart(HighAvailabilityManagerImpl.java:479) at com.cloud.ha.HighAvailabilityManagerImpl$WorkerThread.run(HighAvailabilityManagerImpl.java:831) Caused by: com.cloud.exception.AgentUnavailableException: Resource [Host:5] is unreachable: Host 5: Host with specified id is not in the right state: Down at com.cloud.agent.manager.ClusteredAgentManagerImpl.getAttache(ClusteredAgentManagerImpl.java:540) at com.cloud.agent.manager.AgentManagerImpl.send(AgentManagerImpl.java:479) at com.cloud.agent.manager.AgentManagerImpl.send(AgentManagerImpl.java:439) at com.cloud.vm.VirtualMachineManagerImpl.advanceStop(VirtualMachineManagerImpl.java:1220) at com.cloud.ha.HighAvailabilityManagerImpl.restart(HighAvailabilityManagerImpl.java:476) ... 1 more > No HA actions are performed when a KVM host goes offline > -------------------------------------------------------- > > Key: CLOUDSTACK-3535 > URL: https://issues.apache.org/jira/browse/CLOUDSTACK-3535 > Project: CloudStack > Issue Type: Bug > Security Level: Public(Anyone can view this level - this is the > default.) > Components: Hypervisor Controller, KVM, Management Server > Affects Versions: 4.1.0, 4.1.1, 4.2.0 > Environment: KVM (CentOS 6.3) with CloudStack 4.1 > Reporter: Paul Angus > Assignee: edison su > Priority: Blocker > Fix For: 4.2.0 > > Attachments: extract-management-server.log.2013-08-09, > KVM-HA-4.1.1.2013-08-09-v1.patch, management-server.log.Agent > > > If a KVM host 'goes down', CloudStack does not perform HA for instances which > are marked as HA enabled on that host (including system VMs) > CloudStack does not show the host as disconnected. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira