[
https://issues.apache.org/jira/browse/CLOUDSTACK-4911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sateesh Chodapuneedi resolved CLOUDSTACK-4911.
----------------------------------------------
Resolution: Fixed
> [Mixed Hypervisor] VM Status is marked as alive when exit status of ping
> command is not available within command timeout
> ------------------------------------------------------------------------------------------------------------------------
>
> Key: CLOUDSTACK-4911
> URL: https://issues.apache.org/jira/browse/CLOUDSTACK-4911
> Project: CloudStack
> Issue Type: Bug
> Security Level: Public(Anyone can view this level - this is the
> default.)
> Components: VMware
> Affects Versions: 4.2.0
> Environment: Zone with a KVM cluster and VMware cluster
> Reporter: Sateesh Chodapuneedi
> Assignee: Sateesh Chodapuneedi
> Fix For: 4.2.1
>
>
> Setup:
> 1-KVM-cluster with two hosts host1,host2
> 2-Vmware cluster with 1 host host3
> 3-In KVM cluster create HAenabled VM1 System vms including (virtual router1)
> VR1 is running on host1 Rack2host17
> 4-In vmware cluster create HAenabled VM2 on host3 (vmware ) VR2 +1 guest vm
> is running on host3 51.4
> 5-Deploy a HA enable VM3 on host2 Rack2Host18
> Steps:
> 1) Create KVM Instance which connect to VMWare Virtual Router
> Instance Name:v-cl-test-10658-003-M00000002
> Network:PublicFrontSegment-VM
> Virtual ROuter: r-13123-VM
> 2) Migrate the Instance to the host(tckktky4-pbhpv081) which will be down
> 3) Shutdown the host(tckktky4-pbhpv081)
> 17:27 tckktky4-pbhpv081 shutdown
> 4) Host down detected
> 2013-05-08 17:32:24,233 WARN [agent.manager.AgentAttache]
> (StatsCollector-2:null) Seq 177-582680794: Timed out on null
> 2013-05-08 17:32:24,233 WARN [agent.manager.AgentManagerImpl]
> (StatsCollector-2:null) Operation timed out: Commands 582680794 to Host 177
> timed out after 3600
> ...
> 2013-05-08 17:32:28,552 DEBUG [cloud.ha.UserVmDomRInvestigator]
> (HA-Worker-1:work-633) user vm v-cl-test-10658-003-M00000002 has been
> successfully pinged, returning that it is alive
> ★ after detecting ping 100% loss, confirmed Instance alive in the log
> ・・・
> 2013-05-08 17:32:28,552 DEBUG [cloud.ha.HighAvailabilityManagerImpl]
> (HA-Worker-1:work-633) Rescheduling because the host is not up but the vm is
> alive
> =====
> VM HA re-scheduling was repeated for 8 times and succeeded after failure of 7
> times to start VM. In 8th attempt VM got HAed to other KVM host.
> Root cause is : Exit status of ping command is not available within command
> timeout of 20 seconds.
--
This message was sent by Atlassian JIRA
(v6.1#6144)