[
https://issues.apache.org/jira/browse/CLOUDSTACK-5610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13855631#comment-13855631
]
Sanjeev N commented on CLOUDSTACK-5610:
---------------------------------------
Similar behavior has been observed in case of network disconnect.
> [Hyper-v] Host does not go into Alert state even though it is power-off hence
> vm deployment fails
> -------------------------------------------------------------------------------------------------
>
> Key: CLOUDSTACK-5610
> URL: https://issues.apache.org/jira/browse/CLOUDSTACK-5610
> Project: CloudStack
> Issue Type: Bug
> Security Level: Public(Anyone can view this level - this is the
> default.)
> Components: Hypervisor Controller, Management Server
> Affects Versions: 4.3.0
> Environment: Latest build from 4.3 with commit
> :d462db4ae5c30e677d5810111f9ea5ca6812bce2
> Storage: SMB for both primary and secondary
> Hypervisor: Hyper-v
> Reporter: Sanjeev N
> Priority: Blocker
> Labels: hyper-V,
> Fix For: 4.3.0
>
> Attachments: cloud.dmp, management-server.rar
>
>
> [Hyper-v] Host does not go into Alert state even though it is power-off hence
> vm deployment fails
> Steps to Reproduce:
> =================
> 1.Bring up CS in advanced zone with with 2 or more Hyper-v hosts using SMB
> for both primary and secondary
> 2.Enable the zone and deploy few vms. Make sure that vms are distributed
> across all the hosts
> 3.Power off one of the hosts(Power off the hosts where vms are running)
> Expected Result:
> ==============
> Host should go into Alert state and all the vms running on it should be
> stopped
> Actual Result:
> ============
> Host remains in Up state and all the vms state show as running.
> I could see the ping commands to Hypervsior aget, system vm agents in the MS
> log. Even though the agents are behind ping, agent status remains in UP state.
> At this state , I have tried to deploy a vm and deployment planner chose the
> host which was powered off . Hence the vm deployment failed.
> Also CPVM was running on the powered off host. That also remained in running
> state. Since cpvm agent is not reachable from CS it should have been stopped
> and started on another Host in the cluster.
> 2013-12-23 18:19:25,334 ERROR [c.c.h.h.r.HypervDirectConnectResource]
> (DirectAgent-331:ctx-831c60e9) org.apache.http.conn.HttpHostConnectException:
> Connection to http://10.147.40.31:8250 refused
> 2013-12-23 18:19:25,334 INFO [c.c.h.h.r.HypervDirectConnectResource]
> (DirectAgent-331:ctx-831c60e9) Cannot ping host 10.147.40.31 (IP
> 10.147.40.31), pingAns (blank means null)
> is:com.cloud.agent.api.UnsupportedAnswer
> 2013-12-23 18:19:25,334 WARN [c.c.a.m.DirectAgentAttache]
> (DirectAgent-331:ctx-831c60e9) Unable to get current status on 5(10.147.40.31)
> 2013-12-23 18:19:25,336 INFO [c.c.a.m.AgentManagerImpl]
> (AgentTaskPool-16:ctx-be3804c7) Investigating why host 5 has disconnected
> with event AgentDisconnected
> 2013-12-23 18:19:25,336 DEBUG [c.c.a.m.AgentManagerImpl]
> (AgentTaskPool-16:ctx-be3804c7) checking if agent (5) is alive
> 2013-12-23 18:19:25,339 DEBUG [c.c.a.t.Request]
> (AgentTaskPool-16:ctx-be3804c7) Seq 5-1482556239: Sending { Cmd , MgmtId:
> 132129494109518, via: 5(10.147.40.31), Ver: v1, Flags: 100011,
> [{"com.cloud.agent.api.CheckHealthCommand":{"wait":50}}] }
> 2013-12-23 18:19:25,339 DEBUG [c.c.a.t.Request]
> (AgentTaskPool-16:ctx-be3804c7) Seq 5-1482556239: Executing: { Cmd , MgmtId:
> 132129494109518, via: 5(10.147.40.31), Ver: v1, Flags: 100011,
> [{"com.cloud.agent.api.CheckHealthCommand":{"wait":50}}] }
> 2013-12-23 18:19:25,339 DEBUG [c.c.a.m.DirectAgentAttache]
> (DirectAgent-325:ctx-39f5ed39) Seq 5-1482556239: Executing request
> 2013-12-23 18:19:25,339 DEBUG [c.c.h.h.r.HypervDirectConnectResource]
> (DirectAgent-325:ctx-39f5ed39) POST request
> tohttp://10.147.40.31:8250/api/HypervResource/com.cloud.agent.api.CheckHealthCommand
> with contents{"contextMap":{},"wait":50}
> 2013-12-23 18:19:25,340 DEBUG [c.c.h.h.r.HypervDirectConnectResource]
> (DirectAgent-325:ctx-39f5ed39) Sending cmd to
> http://10.147.40.31:8250/api/HypervResource/com.cloud.agent.api.CheckHealthCommand
> cmd data:{"contextMap":{},"wait":50}
> 2013-12-23 18:19:46,345 DEBUG [c.c.h.UserVmDomRInvestigator]
> (AgentTaskPool-16:ctx-be3804c7) checking if agent (5) is alive
> 2013-12-23 18:19:46,347 DEBUG [c.c.h.UserVmDomRInvestigator]
> (AgentTaskPool-16:ctx-be3804c7) sending ping from (1) to agent's host ip
> address (10.147.40.31)
> 2013-12-23 18:19:46,349 DEBUG [c.c.a.t.Request]
> (AgentTaskPool-16:ctx-be3804c7) Seq 1-790364876: Sending { Cmd , MgmtId:
> 132129494109518, via: 1(10.147.40.14), Ver: v1, Flags: 100011,
> [{"com.cloud.agent.api.PingTestCommand":{"_computingHostIp":"10.147.40.31","wait":20}}]
> }
> 2013-12-23 18:19:46,349 DEBUG [c.c.a.t.Request]
> (AgentTaskPool-16:ctx-be3804c7) Seq 1-790364876: Executing: { Cmd , MgmtId:
> 132129494109518, via: 1(10.147.40.14), Ver: v1, Flags: 100011,
> [{"com.cloud.agent.api.PingTestCommand":{"_computingHostIp":"10.147.40.31","wait":20}}]
> }
> 2013-12-23 18:19:46,350 DEBUG [c.c.a.m.DirectAgentAttache]
> (DirectAgent-353:ctx-a48feb80) Seq 1-790364876: Executing request
> 2013-12-23 18:19:46,350 INFO [c.c.h.h.r.HypervDirectConnectResource]
> (DirectAgent-353:ctx-a48feb80) Executing resource PingTestCommand:
> {"_computingHostIp":"10.147.40.31","contextMap":{},"wait":20}
> 2013-12-23 18:19:46,351 ERROR [c.c.h.h.r.HypervDirectConnectResource]
> (DirectAgent-353:ctx-a48feb80) Unable to execute ping command on DomR (null),
> domR may not be ready yet. failure due to There was a problem while
> connecting to null:3922
> 2013-12-23 18:19:46,351 DEBUG [c.c.a.m.DirectAgentAttache]
> (DirectAgent-353:ctx-a48feb80) Seq 1-790364876: Response Received:
> 2013-12-23 18:19:46,351 DEBUG [c.c.a.t.Request]
> (DirectAgent-353:ctx-a48feb80) Seq 1-790364876: Processing: { Ans: , MgmtId:
> 132129494109518, via: 1, Ver: v1, Flags: 10,
> [{"com.cloud.agent.api.Answer":{"result":false,"details":"PingTestCommand
> failed","wait":0}}] }
> 2013-12-23 18:19:46,351 DEBUG [c.c.a.t.Request]
> (AgentTaskPool-16:ctx-be3804c7) Seq 1-790364876: Received: { Ans: , MgmtId:
> 132129494109518, via: 1, Ver: v1, Flags: 10, { Answer } }
> 2013-12-23 18:19:46,351 DEBUG [c.c.h.AbstractInvestigatorImpl]
> (AgentTaskPool-16:ctx-be3804c7) host (10.147.40.31) cannot be pinged,
> returning null ('I don't know')
> 2013-12-23 18:19:46,351 DEBUG [c.c.h.UserVmDomRInvestigator]
> (AgentTaskPool-16:ctx-be3804c7) could not reach agent, could not reach
> agent's host, returning that we don't have enough information
> 2013-12-23 18:19:46,351 DEBUG [c.c.h.HighAvailabilityManagerImpl]
> (AgentTaskPool-16:ctx-be3804c7) PingInvestigator unable to determine the
> state of the host. Moving on.
> 2013-12-23 18:19:46,351 DEBUG [c.c.h.HighAvailabilityManagerImpl]
> (AgentTaskPool-16:ctx-be3804c7) ManagementIPSysVMInvestigator unable to
> determine the state of the host. Moving on.
> 2013-12-23 18:19:46,351 DEBUG [c.c.h.HighAvailabilityManagerImpl]
> (AgentTaskPool-16:ctx-be3804c7) KVMInvestigator unable to determine the state
> of the host. Moving on.
> 2013-12-23 18:19:46,351 DEBUG [c.c.h.HighAvailabilityManagerImpl]
> (AgentTaskPool-16:ctx-be3804c7) VMwareInvestigator unable to determine the
> state of the host. Moving on.
> 2013-12-23 18:19:46,351 WARN [c.c.a.m.AgentManagerImpl]
> (AgentTaskPool-16:ctx-be3804c7) Agent state cannot be determined, do nothing
> Attaching MS log and cloud DB.
> Agent 5 is the host which was powered off.
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)