Sateesh Chodapuneedi created CLOUDSTACK-4911:
------------------------------------------------

             Summary: [Mixed Hypervisor] VM Status is marked as alive when exit 
status of ping command is not available within command timeout
                 Key: CLOUDSTACK-4911
                 URL: https://issues.apache.org/jira/browse/CLOUDSTACK-4911
             Project: CloudStack
          Issue Type: Bug
      Security Level: Public (Anyone can view this level - this is the default.)
          Components: VMware
    Affects Versions: 4.2.0
         Environment: Zone with a KVM cluster and VMware cluster
            Reporter: Sateesh Chodapuneedi
            Assignee: Sateesh Chodapuneedi
             Fix For: 4.2.1


Setup:

1-KVM-cluster with two hosts host1,host2
2-Vmware cluster with 1 host host3
3-In KVM cluster create HAenabled VM1 System vms including (virtual router1) 
VR1 is running on host1 Rack2host17
4-In vmware cluster create HAenabled VM2 on host3 (vmware ) VR2 +1 guest vm is 
running on host3 51.4
5-Deploy a HA enable VM3 on host2 Rack2Host18

Steps:

1) Create KVM Instance which connect to VMWare Virtual Router
 Instance Name:v-cl-test-10658-003-M00000002
 Network:PublicFrontSegment-VM
 Virtual ROuter: r-13123-VM
2) Migrate the Instance to the host(tckktky4-pbhpv081) which will be down
3) Shutdown the host(tckktky4-pbhpv081)
 17:27 tckktky4-pbhpv081 shutdown
4) Host down detected

2013-05-08 17:32:24,233 WARN [agent.manager.AgentAttache]
 (StatsCollector-2:null) Seq 177-582680794: Timed out on null
2013-05-08 17:32:24,233 WARN [agent.manager.AgentManagerImpl]
 (StatsCollector-2:null) Operation timed out: Commands 582680794 to Host 177 
timed out after 3600
...
2013-05-08 17:32:28,552 DEBUG [cloud.ha.UserVmDomRInvestigator]
 (HA-Worker-1:work-633) user vm v-cl-test-10658-003-M00000002 has been
 successfully pinged, returning that it is alive
 ★ after detecting ping 100% loss, confirmed Instance alive in the log
・・・
2013-05-08 17:32:28,552 DEBUG [cloud.ha.HighAvailabilityManagerImpl]
 (HA-Worker-1:work-633) Rescheduling because the host is not up but the vm is 
alive
=====

VM HA re-scheduling was repeated for 8 times and succeeded after failure of 7 
times to start VM. In 8th attempt VM got HAed to other KVM host.

Root cause is : Exit status of ping command is not available within command 
timeout of 20 seconds.





--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to