Kiran Koneti created CLOUDSTACK-4936:
----------------------------------------
Summary: System VM's Agent goes to disconnected state but VM state
still up when they are stopped manually.
Key: CLOUDSTACK-4936
URL: https://issues.apache.org/jira/browse/CLOUDSTACK-4936
Project: CloudStack
Issue Type: Bug
Security Level: Public (Anyone can view this level - this is the default.)
Components: SystemVM
Affects Versions: 4.2.1
Reporter: Kiran Koneti
Priority: Critical
Fix For: 4.2.1
Steps are as below:
1)Installed the CS using the Trail install.(MS Centos 6.2 and Hypervisor Xen
5.6sp2 + CSP installed)
2)once the setup is up and system Vm's came up tried to deploy Some VM's and
the VM deployment was successful.
3)Tried to stop the SSVM and CPVM the Stop command is passed to the hypervisor
and the VM's are sopped in the Hypervisor .
4)But the cloudstack UI pops with a message unable to stop the VM and updates
the status as follows in UI.
5)The VM state is UP but the Agent state shows disconnected.
The Management server logs are as below:
"2013-10-23 16:49:07,476 DEBUG [cloud.async.AsyncJobManagerImpl]
(catalina-exec-19:null) submit async job-212 = [
fec03341-1c65-4752-8f62-21bffa5bc178 ], details: AsyncJobVO {id:212, userId: 2,
accountId: 2, sessionKey: null, instanceType: SystemVm, instanceId: 3, cmd:
org.apache.cloudstack.api.command.admin.systemvm.StopSystemVmCmd,
cmdOriginator: null, cmdInfo:
{"response":"json","id":"ffe2558d-516f-4d06-94fe-be20ad83cede","sessionkey":"eEEdlU+E7L8Kb1Op0S4B1PIyhUE\u003d","cmdEventType":"SSVM.STOP","ctxUserId":"2","httpmethod":"GET","_":"1382527274866","ctxAccountId":"2","ctxStartEventId":"966"},
cmdVersion: 0, callbackType: 0, callbackAddress: null, status: 0,
processStatus: 0, resultCode: 0, result: null, initMsid: 233845173167606,
completeMsid: null, lastUpdated: null, lastPolled: null, created: null}
2013-10-23 16:49:07,477 DEBUG [cloud.async.AsyncJobManagerImpl]
(Job-Executor-120:job-212 = [ fec03341-1c65-4752-8f62-21bffa5bc178 ]) Executing
org.apache.cloudstack.api.command.admin.systemvm.StopSystemVmCmd for job-212 =
[ fec03341-1c65-4752-8f62-21bffa5bc178 ]
2013-10-23 16:49:07,478 DEBUG [cloud.api.ApiServlet] (catalina-exec-19:null)
===END=== 10.252.192.62 -- GET
command=stopSystemVm&id=ffe2558d-516f-4d06-94fe-be20ad83cede&response=json&sessionkey=eEEdlU%2BE7L8Kb1Op0S4B1PIyhUE%3D&_=1382527274866
2013-10-23 16:49:07,551 DEBUG [cloud.capacity.CapacityManagerImpl]
(Job-Executor-120:job-212 = [ fec03341-1c65-4752-8f62-21bffa5bc178 ]) VM state
transitted from :Running to Stopping with event: StopRequestedvm's original
host id: 1 new host id: 1 host id before state transition: 1
2013-10-23 16:49:07,553 DEBUG [agent.transport.Request]
(Job-Executor-120:job-212 = [ fec03341-1c65-4752-8f62-21bffa5bc178 ]) Seq
1-964559712: Sending { Cmd , MgmtId: 233845173167606, via: 1, Ver: v1, Flags:
100111,
[{"com.cloud.agent.api.StopCommand":{"isProxy":false,"executeInSequence":true,"vmName":"s-3-VM","wait":0}}]
}
2013-10-23 16:49:07,553 DEBUG [agent.transport.Request]
(Job-Executor-120:job-212 = [ fec03341-1c65-4752-8f62-21bffa5bc178 ]) Seq
1-964559712: Executing: { Cmd , MgmtId: 233845173167606, via: 1, Ver: v1,
Flags: 100111,
[{"com.cloud.agent.api.StopCommand":{"isProxy":false,"executeInSequence":true,"vmName":"s-3-VM","wait":0}}]
}
2013-10-23 16:49:07,553 DEBUG [agent.manager.DirectAgentAttache]
(DirectAgent-17:null) Seq 1-964559712: Executing request
2013-10-23 16:49:07,653 DEBUG [xen.resource.CitrixResourceBase]
(DirectAgent-17:null) 9. The VM s-3-VM is in Stopping state
2013-10-23 16:49:07,935 INFO [xen.resource.CitrixResourceBase]
(DirectAgent-17:null) Removed network rules for vm s-3-VM
2013-10-23 16:49:08,288 DEBUG [agent.manager.AgentManagerImpl]
(AgentManager-Handler-9:null) SeqA 5-25: Processing Seq 5-25: { Cmd , MgmtId:
-1, via: 5, Ver: v1, Flags: 11,
[{"com.cloud.agent.api.ConsoleProxyLoadReportCommand":{"_proxyVmId":4,"_loadInfo":"{\n
\"connections\": []\n}","wait":0}}] }
2013-10-23 16:49:08,368 DEBUG [agent.manager.AgentManagerImpl]
(AgentManager-Handler-9:null) SeqA 5-25: Sending Seq 5-25: { Ans: , MgmtId:
233845173167606, via: 5, Ver: v1, Flags: 100010,
[{"com.cloud.agent.api.AgentControlAnswer":{"result":true,"wait":0}}] }
2013-10-23 16:49:09,162 DEBUG [agent.manager.AgentManagerImpl]
(AgentManager-Handler-15:null) SeqA 4--1: Processing Seq 4--1: { Cmd , MgmtId:
-1, via: 4, Ver: v1, Flags: 111,
[{"com.cloud.agent.api.ShutdownCommand":{"reason":"sig.kill","wait":0}}] }
2013-10-23 16:49:09,162 INFO [agent.manager.AgentManagerImpl]
(AgentManager-Handler-15:null) Host 4 has informed us that it is shutting down
with reason sig.kill and detail null
2013-10-23 16:49:09,163 INFO [agent.manager.AgentManagerImpl]
(AgentTaskPool-11:null) Host 4 is disconnecting with event ShutdownRequested
2013-10-23 16:49:09,166 DEBUG [agent.manager.AgentManagerImpl]
(AgentTaskPool-11:null) The next status of agent 4is Disconnected, current
status is Up
2013-10-23 16:49:09,166 DEBUG [agent.manager.AgentManagerImpl]
(AgentTaskPool-11:null) Deregistering link for 4 with state Disconnected
2013-10-23 16:49:09,166 DEBUG [agent.manager.AgentManagerImpl]
(AgentTaskPool-11:null) Remove Agent : 4
2013-10-23 16:49:09,166 DEBUG [agent.manager.AgentManagerImpl]
(AgentTaskPool-11:null) Remove Agent : 4
2013-10-23 16:49:09,166 DEBUG [agent.manager.ConnectedAgentAttache]
(AgentTaskPool-11:null) Processing Disconnect.
2013-10-23 16:49:09,166 DEBUG [agent.manager.ConnectedAgentAttache]
(AgentTaskPool-11:null) Processing Disconnect.
2013-10-23 16:49:09,166 DEBUG [agent.manager.AgentManagerImpl]
(AgentTaskPool-11:null) Sending Disconnect to listener:
com.cloud.hypervisor.xen.discoverer.XcpServerDiscoverer_EnhancerByCloudStack_621251c1
2013-10-23 16:49:09,166 DEBUG [agent.manager.AgentManagerImpl]
(AgentTaskPool-11:null) Sending Disconnect to listener:
com.cloud.deploy.DeploymentPlanningManagerImpl_EnhancerByCloudStack_28d6dc37
2013-10-23 16:49:09,166 DEBUG [agent.manager.AgentManagerImpl]
(AgentTaskPool-11:null) Sending Disconnect to listener:
com.cloud.network.NetworkManagerImpl_EnhancerByCloudStack_3a67edca
2013-10-23 16:49:09,166 DEBUG [agent.manager.AgentManagerImpl]
(AgentTaskPool-11:null) Sending Disconnect to listener:
com.cloud.storage.secondary.SecondaryStorageListener
2013-10-23 16:49:09,166 DEBUG [agent.manager.AgentManagerImpl]
(AgentTaskPool-11:null) Sending Disconnect to listener:
com.cloud.hypervisor.vmware.manager.VmwareManagerImpl_EnhancerByCloudStack_d1dcecc4
2013-10-23 16:49:09,166 DEBUG [agent.manager.AgentManagerImpl]
(AgentTaskPool-11:null) Sending Disconnect to listener:
com.cloud.network.security.SecurityGroupListener
2013-10-23 16:49:09,166 DEBUG [agent.manager.AgentManagerImpl]
(AgentTaskPool-11:null) Sending Disconnect to listener:
com.cloud.storage.listener.StoragePoolMonitor
2013-10-23 16:49:09,166 DEBUG [agent.manager.AgentManagerImpl]
(AgentTaskPool-11:null) Sending Disconnect to listener:
com.cloud.vm.ClusteredVirtualMachineManagerImpl_EnhancerByCloudStack_67289ece
2013-10-23 16:49:09,167 DEBUG [agent.manager.AgentManagerImpl]
(AgentTaskPool-11:null) Sending Disconnect to listener:
com.cloud.storage.LocalStoragePoolListener
2013-10-23 16:49:09,167 DEBUG [agent.manager.AgentManagerImpl]
(AgentTaskPool-11:null) Sending Disconnect to listener:
com.cloud.network.SshKeysDistriMonitor
2013-10-23 16:49:09,167 DEBUG [agent.manager.AgentManagerImpl]
(AgentTaskPool-11:null) Sending Disconnect to listener:
com.cloud.network.router.VirtualNetworkApplianceManagerImpl_EnhancerByCloudStack_9a0b6f
2013-10-23 16:49:09,167 DEBUG [agent.manager.AgentManagerImpl]
(AgentTaskPool-11:null) Sending Disconnect to listener:
com.cloud.network.SshKeysDistriMonitor
2013-10-23 16:49:09,167 DEBUG [agent.manager.AgentManagerImpl]
(AgentTaskPool-11:null) Sending Disconnect to listener:
com.cloud.network.router.VpcVirtualNetworkApplianceManagerImpl_EnhancerByCloudStack_7b7dd392
2013-10-23 16:49:09,167 DEBUG [agent.manager.AgentManagerImpl]
(AgentTaskPool-11:null) Sending Disconnect to listener:
com.cloud.storage.upload.UploadListener
2013-10-23 16:49:09,167 DEBUG [agent.manager.AgentManagerImpl]
(AgentTaskPool-11:null) Sending Disconnect to listener:
com.cloud.storage.download.DownloadListener
2013-10-23 16:49:09,167 DEBUG [agent.manager.AgentManagerImpl]
(AgentTaskPool-11:null) Sending Disconnect to listener:
com.cloud.agent.manager.AgentMonitor
2013-10-23 16:49:09,167 DEBUG [agent.manager.AgentManagerImpl]
(AgentTaskPool-11:null) Sending Disconnect to listener:
com.cloud.capacity.StorageCapacityListener
2013-10-23 16:49:09,167 DEBUG [agent.manager.AgentManagerImpl]
(AgentTaskPool-11:null) Sending Disconnect to listener:
com.cloud.capacity.ComputeCapacityListener
2013-10-23 16:49:09,167 DEBUG [agent.manager.AgentManagerImpl]
(AgentTaskPool-11:null) Sending Disconnect to listener:
com.cloud.network.NetworkUsageManagerImpl$DirectNetworkStatsListener
2013-10-23 16:49:09,167 DEBUG [cloud.network.NetworkUsageManagerImpl]
(AgentTaskPool-11:null) Disconnected called on 4 with status Disconnected
2013-10-23 16:49:09,167 DEBUG [agent.manager.AgentManagerImpl]
(AgentTaskPool-11:null) Sending Disconnect to listener:
com.cloud.consoleproxy.ConsoleProxyListener
2013-10-23 16:49:09,168 DEBUG [cloud.host.Status] (AgentTaskPool-11:null)
Transition:[Resource state = Enabled, Agent event = ShutdownRequested, Host id
= 4, name = s-3-VM]
2013-10-23 16:49:09,263 DEBUG [cloud.host.Status] (AgentTaskPool-11:null) Agent
status update: [id = 4; name = s-3-VM; old status = Up; event =
ShutdownRequested; new status = Disconnected; old update count = 23; new update
count = 24]
2013-10-23 16:49:09,263 DEBUG [agent.manager.ClusteredAgentManagerImpl]
(AgentTaskPool-11:null) Notifying other nodes of to disconnect
2013-10-23 16:49:10,527 DEBUG [cloud.api.ApiServlet] (catalina-exec-14:null)
===START=== 10.252.192.62 -- GET
command=queryAsyncJobResult&jobId=fec03341-1c65-4752-8f62-21bffa5bc178&response=json&sessionkey=eEEdlU%2BE7L8Kb1Op0S4B1PIyhUE%3D&_=1382527278181
2013-10-23 16:49:10,591 DEBUG [cloud.api.ApiServlet] (catalina-exec-14:null)
===END=== 10.252.192.62 -- GET
command=queryAsyncJobResult&jobId=fec03341-1c65-4752-8f62-21bffa5bc178&response=json&sessionkey=eEEdlU%2BE7L8Kb1Op0S4B1PIyhUE%3D&_=1382527278181
2013-10-23 16:49:12,502 DEBUG [host.dao.HostDaoImpl] (ClusteredAgentManager
Timer:null) Resetting hosts suitable for reconnect
2013-10-23 16:49:12,504 DEBUG [host.dao.HostDaoImpl] (ClusteredAgentManager
Timer:null) Completed resetting hosts suitable for reconnect
2013-10-23 16:49:12,504 DEBUG [host.dao.HostDaoImpl] (ClusteredAgentManager
Timer:null) Acquiring hosts for clusters already owned by this management server
2013-10-23 16:49:12,505 DEBUG [host.dao.HostDaoImpl] (ClusteredAgentManager
Timer:null) Completed acquiring hosts for clusters already owned by this
management server
2013-10-23 16:49:12,505 DEBUG [host.dao.HostDaoImpl] (ClusteredAgentManager
Timer:null) Acquiring hosts for clusters not owned by any management server
2013-10-23 16:49:12,505 DEBUG [host.dao.HostDaoImpl] (ClusteredAgentManager
Timer:null) Completed acquiring hosts for clusters not owned by any management
server
2013-10-23 16:49:13,291 DEBUG [agent.manager.AgentManagerImpl]
(AgentManager-Handler-13:null) SeqA 5-26: Processing Seq 5-26: { Cmd , MgmtId:
-1, via: 5, Ver: v1, Flags: 11,
[{"com.cloud.agent.api.ConsoleProxyLoadReportCommand":{"_proxyVmId":4,"_loadInfo":"{\n
\"connections\": []\n}","wait":0}}] }
2013-10-23 16:49:13,346 DEBUG [agent.manager.AgentManagerImpl]
(AgentManager-Handler-13:null) SeqA 5-26: Sending Seq 5-26: { Ans: , MgmtId:
233845173167606, via: 5, Ver: v1, Flags: 100010,
[{"com.cloud.agent.api.AgentControlAnswer":{"result":true,"wait":0}}] }
2013-10-23 16:49:13,526 DEBUG [cloud.api.ApiServlet] (catalina-exec-22:null)
===START=== 10.252.192.62 -- GET
command=queryAsyncJobResult&jobId=fec03341-1c65-4752-8f62-21bffa5bc178&response=json&sessionkey=eEEdlU%2BE7L8Kb1Op0S4B1PIyhUE%3D&_=1382527281181
2013-10-23 16:49:13,601 DEBUG [cloud.api.ApiServlet] (catalina-exec-22:null)
===END=== 10.252.192.62 -- GET
command=queryAsyncJobResult&jobId=fec03341-1c65-4752-8f62-21bffa5bc178&response=json&sessionkey=eEEdlU%2BE7L8Kb1Op0S4B1PIyhUE%3D&_=1382527281181
2013-10-23 16:49:16,526 DEBUG [cloud.api.ApiServlet] (catalina-exec-2:null)
===START=== 10.252.192.62 -- GET
command=queryAsyncJobResult&jobId=fec03341-1c65-4752-8f62-21bffa5bc178&response=json&sessionkey=eEEdlU%2BE7L8Kb1Op0S4B1PIyhUE%3D&_=1382527284180
2013-10-23 16:49:16,619 DEBUG [cloud.api.ApiServlet] (catalina-exec-2:null)
===END=== 10.252.192.62 -- GET
command=queryAsyncJobResult&jobId=fec03341-1c65-4752-8f62-21bffa5bc178&response=json&sessionkey=eEEdlU%2BE7L8Kb1Op0S4B1PIyhUE%3D&_=1382527284180
2013-10-23 16:49:18,091 DEBUG [xen.resource.CitrixResourceBase]
(DirectAgent-17:null) 10. The VM s-3-VM is in Stopped state
2013-10-23 16:49:18,092 DEBUG [agent.manager.DirectAgentAttache]
(DirectAgent-17:null) Seq 1-964559712: Response Received:
2013-10-23 16:49:18,092 DEBUG [agent.transport.Request] (DirectAgent-17:null)
Seq 1-964559712: Processing: { Ans: , MgmtId: 233845173167606, via: 1, Ver:
v1, Flags: 110,
[{"com.cloud.agent.api.StopAnswer":{"hypervisortoolsversion":"xenserver56","result":true,"details":"Stop
VM s-3-VM Succeed","wait":0}}] }
2013-10-23 16:49:18,092 DEBUG [agent.manager.AgentAttache]
(DirectAgent-17:null) Seq 1-964559712: No more commands found
2013-10-23 16:49:18,092 DEBUG [agent.transport.Request]
(Job-Executor-120:job-212 = [ fec03341-1c65-4752-8f62-21bffa5bc178 ]) Seq
1-964559712: Received: { Ans: , MgmtId: 233845173167606, via: 1, Ver: v1,
Flags: 110, { StopAnswer } }
2013-10-23 16:49:18,094 WARN [cloud.vm.VirtualMachineManagerImpl]
(Job-Executor-120:job-212 = [ fec03341-1c65-4752-8f62-21bffa5bc178 ]) Unable to
stop vm VM[SecondaryStorageVm|s-3-VM]
2013-10-23 16:49:18,166 DEBUG [cloud.capacity.CapacityManagerImpl]
(Job-Executor-120:job-212 = [ fec03341-1c65-4752-8f62-21bffa5bc178 ]) VM state
transitted from :Stopping to Running with event: OperationFailedvm's original
host id: 1 new host id: 1 host id before state transition: 1
2013-10-23 16:49:18,167 DEBUG [cloud.async.AsyncJobManagerImpl]
(Job-Executor-120:job-212 = [ fec03341-1c65-4752-8f62-21bffa5bc178 ]) Complete
async job-212 = [ fec03341-1c65-4752-8f62-21bffa5bc178 ], jobStatus: 2,
resultCode: 530, result: Error Code: 530 Error text: Fail to stop system vm
2013-10-23 16:49:18,294 DEBUG [agent.manager.AgentManagerImpl]
(AgentManager-Handler-11:null) SeqA 5-27: Processing Seq 5-27: { Cmd , MgmtId:
-1, via: 5, Ver: v1, Flags: 11,
[{"com.cloud.agent.api.ConsoleProxyLoadReportCommand":{"_proxyVmId":4,"_loadInfo":"{\n
\"connections\": []\n}","wait":0}}] }
2013-10-23 16:49:18,350 DEBUG [agent.manager.AgentManagerImpl]
(AgentManager-Handler-11:null) SeqA 5-27: Sending Seq 5-27: { Ans: , MgmtId:
233845173167606, via: 5, Ver: v1, Flags: 100010,
[{"com.cloud.agent.api.AgentControlAnswer":{"result":true,"wait":0}}] }
2013-10-23 16:49:19,527 DEBUG [cloud.api.ApiServlet] (catalina-exec-25:null)
===START=== 10.252.192.62 -- GET
command=queryAsyncJobResult&jobId=fec03341-1c65-4752-8f62-21bffa5bc178&response=json&sessionkey=eEEdlU%2BE7L8Kb1Op0S4B1PIyhUE%3D&_=1382527287180
2013-10-23 16:49:19,541 DEBUG [cloud.async.AsyncJobManagerImpl]
(catalina-exec-25:null) Async job-212 = [ fec03341-1c65-4752-8f62-21bffa5bc178
] completed"
Observations:
Once I reboot the System Vm's then they are recreated and the agent status is
connected and the VM's are created in the Hypervisor also.
Attaching the Management sever logs.
--
This message was sent by Atlassian JIRA
(v6.1#6144)