[ 
https://issues.apache.org/jira/browse/CLOUDSTACK-4911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13800780#comment-13800780
 ] 

ASF subversion and git services commented on CLOUDSTACK-4911:
-------------------------------------------------------------

Commit b6a13d125773371813734e87bd39c6030707f97c in branch refs/heads/4.2 from 
[~sateeshc]
[ https://git-wip-us.apache.org/repos/asf?p=cloudstack.git;h=b6a13d1 ]

CLOUDSTACK-4911 - [Mixed Hypervisor] VM Status is marked as alive when exit 
status of ping command is not available within command timeout

Currently during ssh execution of remote command, if no response is received 
within timeout, Cloudstack is returning success result.
This is resulting in false positives. Fix is to check if exit status of remote 
command is available or not. If not, return failure result.

Signed-off-by: Sateesh Chodapuneedi <[email protected]>


> [Mixed Hypervisor] VM Status is marked as alive when exit status of ping 
> command is not available within command timeout
> ------------------------------------------------------------------------------------------------------------------------
>
>                 Key: CLOUDSTACK-4911
>                 URL: https://issues.apache.org/jira/browse/CLOUDSTACK-4911
>             Project: CloudStack
>          Issue Type: Bug
>      Security Level: Public(Anyone can view this level - this is the 
> default.) 
>          Components: VMware
>    Affects Versions: 4.2.0
>         Environment: Zone with a KVM cluster and VMware cluster
>            Reporter: Sateesh Chodapuneedi
>            Assignee: Sateesh Chodapuneedi
>             Fix For: 4.2.1
>
>
> Setup:
> 1-KVM-cluster with two hosts host1,host2
> 2-Vmware cluster with 1 host host3
> 3-In KVM cluster create HAenabled VM1 System vms including (virtual router1) 
> VR1 is running on host1 Rack2host17
> 4-In vmware cluster create HAenabled VM2 on host3 (vmware ) VR2 +1 guest vm 
> is running on host3 51.4
> 5-Deploy a HA enable VM3 on host2 Rack2Host18
> Steps:
> 1) Create KVM Instance which connect to VMWare Virtual Router
>  Instance Name:v-cl-test-10658-003-M00000002
>  Network:PublicFrontSegment-VM
>  Virtual ROuter: r-13123-VM
> 2) Migrate the Instance to the host(tckktky4-pbhpv081) which will be down
> 3) Shutdown the host(tckktky4-pbhpv081)
>  17:27 tckktky4-pbhpv081 shutdown
> 4) Host down detected
> 2013-05-08 17:32:24,233 WARN [agent.manager.AgentAttache]
>  (StatsCollector-2:null) Seq 177-582680794: Timed out on null
> 2013-05-08 17:32:24,233 WARN [agent.manager.AgentManagerImpl]
>  (StatsCollector-2:null) Operation timed out: Commands 582680794 to Host 177 
> timed out after 3600
> ...
> 2013-05-08 17:32:28,552 DEBUG [cloud.ha.UserVmDomRInvestigator]
>  (HA-Worker-1:work-633) user vm v-cl-test-10658-003-M00000002 has been
>  successfully pinged, returning that it is alive
>  ★ after detecting ping 100% loss, confirmed Instance alive in the log
> ・・・
> 2013-05-08 17:32:28,552 DEBUG [cloud.ha.HighAvailabilityManagerImpl]
>  (HA-Worker-1:work-633) Rescheduling because the host is not up but the vm is 
> alive
> =====
> VM HA re-scheduling was repeated for 8 times and succeeded after failure of 7 
> times to start VM. In 8th attempt VM got HAed to other KVM host.
> Root cause is : Exit status of ping command is not available within command 
> timeout of 20 seconds.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to