[ https://issues.apache.org/jira/browse/CLOUDSTACK-9575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15882794#comment-15882794 ]
Mani Prashanth Varma Manthena commented on CLOUDSTACK-9575: ----------------------------------------------------------- Here are the possible fixes for this issue: https://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1006232 https://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2127554 > ACS 4.9 + VMware/ESXi 6.0: Sometimes VRs are failing to fully come up into > running state > ---------------------------------------------------------------------------------------- > > Key: CLOUDSTACK-9575 > URL: https://issues.apache.org/jira/browse/CLOUDSTACK-9575 > Project: CloudStack > Issue Type: Bug > Security Level: Public(Anyone can view this level - this is the > default.) > Components: Management Server, VMware > Affects Versions: 4.9.0 > Reporter: Mani Prashanth Varma Manthena > Priority: Critical > Labels: vmware > Attachments: hostd.log, management-server.log.zip > > > Here is the exception in the cloudstack-management server log: > {noformat} > 2016-10-21 07:33:53,361 WARN [c.c.n.v.VpcManagerImpl] > (API-Job-Executor-62:ctx-749d66c8 job-102 ctx-25e96d70) (logid:502d5b39) > Failed to start vpc [VPC [9-nuageVpcWithoutStaticNatNetACL] due to > com.cloud.exception.AgentUnavailableException: Resource [Host:3] is > unreachable: Host 3: Unable to start instance due to Unable to start > VM:bd09d7ab-81f1-4053-a838-869d108cb827 due to error in finalizeStart, not > retrying > at > com.cloud.vm.VirtualMachineManagerImpl.orchestrateStart(VirtualMachineManagerImpl.java:1120) > at > com.cloud.vm.VirtualMachineManagerImpl.orchestrateStart(VirtualMachineManagerImpl.java:4592) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > com.cloud.vm.VmWorkJobHandlerProxy.handleVmWorkJob(VmWorkJobHandlerProxy.java:107) > at > com.cloud.vm.VirtualMachineManagerImpl.handleVmWorkJob(VirtualMachineManagerImpl.java:4753) > at > com.cloud.vm.VmWorkJobDispatcher.runJob(VmWorkJobDispatcher.java:102) > at > org.apache.cloudstack.framework.jobs.impl.AsyncJobManagerImpl$5.runInContext(AsyncJobManagerImpl.java:554) > at > org.apache.cloudstack.managed.context.ManagedContextRunnable$1.run(ManagedContextRunnable.java:49) > at > org.apache.cloudstack.managed.context.impl.DefaultManagedContext$1.call(DefaultManagedContext.java:56) > at > org.apache.cloudstack.managed.context.impl.DefaultManagedContext.callWithContext(DefaultManagedContext.java:103) > at > org.apache.cloudstack.managed.context.impl.DefaultManagedContext.runWithContext(DefaultManagedContext.java:53) > at > org.apache.cloudstack.managed.context.ManagedContextRunnable.run(ManagedContextRunnable.java:46) > at > org.apache.cloudstack.framework.jobs.impl.AsyncJobManagerImpl$5.run(AsyncJobManagerImpl.java:502) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > Caused by: com.cloud.utils.exception.ExecutionException: Unable to start > VM:bd09d7ab-81f1-4053-a838-869d108cb827 due to error in finalizeStart, not > retrying > at > com.cloud.vm.VirtualMachineManagerImpl.orchestrateStart(VirtualMachineManagerImpl.java:1084) > at > com.cloud.vm.VirtualMachineManagerImpl.orchestrateStart(VirtualMachineManagerImpl.java:4592) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > ... 18 more > 2016-10-21 07:33:53,362 DEBUG [c.c.n.v.VpcManagerImpl] > (API-Job-Executor-62:ctx-749d66c8 job-102 ctx-25e96d70) (logid:502d5b39) > Destroying vpc [VPC [9-nuageVpcWithoutStaticNatNetACL] that failed to start > 2016-10-21 07:33:53,362 DEBUG [c.c.n.v.VpcManagerImpl] > (API-Job-Executor-62:ctx-749d66c8 job-102 ctx-25e96d70) (logid:502d5b39) > Destroying vpc [VPC [9-nuageVpcWithoutStaticNatNetACL] > 2016-10-21 07:33:53,363 DEBUG [c.c.n.v.VpcManagerImpl] > (API-Job-Executor-62:ctx-749d66c8 job-102 ctx-25e96d70) (logid:502d5b39) > Updating VPC [VPC [9-nuageVpcWithoutStaticNatNetACL] with state Inactive as a > part of vpc delete > 2016-10-21 07:33:53,373 DEBUG [c.c.n.v.VpcManagerImpl] > (API-Job-Executor-62:ctx-749d66c8 job-102 ctx-25e96d70) (logid:502d5b39) > Shutting down vpc [VPC [9-nuageVpcWithoutStaticNatNetACL] > 2016-10-21 07:33:53,376 DEBUG [c.c.n.r.NetworkHelperImpl] > (API-Job-Executor-62:ctx-749d66c8 job-102 ctx-25e96d70) (logid:502d5b39) > Attempting to destroy router 13 > {noformat} > Note: We are hitting this issue 1 out of three times in our regressions. > Here are the failures on the ESXi 6.0 host: > {noformat} > 2016-10-25T05:54:12.964Z error hostd[2B981B70] [Originator@6876 sub=Hbrsvc] > Failed to retrieve VM config (id=7) > > 2016-10-25T05:54:12.964Z error hostd[2B981B70] [Originator@6876 sub=Hbrsvc] > Replicator: VmReconfig failed to retrieve replication config for VM 7, > ignoring: vim.fault.ReplicationVmConfigFault > 2016-10-25T05:54:12.996Z warning hostd[2BB81B70] [Originator@6876 > sub=Vmsvc.vm:/vmfs/volumes/469adbc4-fd0b07aa/r-7-VM/r-7-VM.vmx > opID=76510a94-ea-f9-04a0 user=vpxuser:VSPHERE.LOCAL\Administrator] > CannotRetrieveCorefiles: VM is in an invalid state > 2016-10-25T05:54:13.022Z info hostd[2BB81B70] [Originator@6876 sub=Libs > opID=76510a94-ea-f9-04a0 user=vpxuser:VSPHERE.LOCAL\Administrator] Error > getting extended config name from config file. > 2016-10-25T05:54:13.022Z info hostd[2BB81B70] [Originator@6876 sub=Libs > opID=76510a94-ea-f9-04a0 user=vpxuser:VSPHERE.LOCAL\Administrator] Error > getting extended config file path. > 2016-10-25T05:54:55.764Z info hostd[2BC85B70] [Originator@6876 > sub=Vmsvc.vm:/vmfs/volumes/469adbc4-fd0b07aa/r-7-VM/r-7-VM.vmx > opID=HB-SpecSync-host-3855@93-2beb0d5-87-06ad user=vpxuser] State Transition > (VM_STATE_ON -> VM_STATE_RECONFIGURING) > 2016-10-25T05:54:55.765Z info hostd[2BC85B70] [Originator@6876 > sub=Vmsvc.vm:/vmfs/volumes/469adbc4-fd0b07aa/r-7-VM/r-7-VM.vmx > opID=HB-SpecSync-host-3855@93-2beb0d5-87-06ad user=vpxuser] State Transition > (VM_STATE_RECONFIGURING -> VM_STATE_ON) > 2016-10-25T05:54:55.766Z info hostd[2BBC2B70] [Originator@6876 > sub=Guestsvc.GuestFileTransferImpl] Entered VmPowerStateListener > > 2016-10-25T05:54:55.766Z info hostd[2BBC2B70] [Originator@6876 > sub=Guestsvc.GuestFileTransferImpl] VmPowerStateListener succeeded > > 2016-10-25T05:54:55.766Z info hostd[2BBC2B70] [Originator@6876 sub=Hbrsvc] > Replicator: powerstate change VM: 7 Old: 1 New: 1 > > 2016-10-25T05:54:55.766Z info hostd[2BC85B70] [Originator@6876 > sub=Vmsvc.vm:/vmfs/volumes/469adbc4-fd0b07aa/r-7-VM/r-7-VM.vmx > opID=HB-SpecSync-host-3855@93-2beb0d5-87-06ad user=vpxuser] Send config > update invoked > 2016-10-25T05:54:55.783Z info hostd[2BC85B70] [Originator@6876 > sub=Vimsvc.TaskManager opID=HB-SpecSync-host-3855@93-2beb0d5-87-06ad > user=vpxuser] Task Completed : > haTask-ha-root-pool-vim.ResourcePool.updateChildResourceConfiguration-1928 > Status success > 2016-10-25T05:54:58.330Z info hostd[2A59FB70] [Originator@6876 > sub=Vmsvc.vm:/vmfs/volumes/469adbc4-fd0b07aa/r-7-VM/r-7-VM.vmx] Send config > update invoked > 2016-10-25T05:54:58.366Z info hostd[2A59FB70] [Originator@6876 > sub=Vmsvc.vm:/vmfs/volumes/469adbc4-fd0b07aa/r-7-VM/r-7-VM.vmx] Send config > update invoked > 2016-10-25T05:54:58.385Z info hostd[2A59FB70] [Originator@6876 > sub=Vmsvc.vm:/vmfs/volumes/469adbc4-fd0b07aa/r-7-VM/r-7-VM.vmx] Send config > update invoked > 2016-10-25T05:54:58.438Z info hostd[2A59FB70] [Originator@6876 > sub=Vmsvc.vm:/vmfs/volumes/469adbc4-fd0b07aa/r-7-VM/r-7-VM.vmx] Tools > manifest version status changed from guestToolsNotInstalled to > guestToolsUnmana > 2016-10-25T05:54:58.447Z info hostd[2A59FB70] [Originator@6876 > sub=Vmsvc.vm:/vmfs/volumes/469adbc4-fd0b07aa/r-7-VM/r-7-VM.vmx] Send config > update invoked > 2016-10-25T05:54:58.469Z info hostd[2A59FB70] [Originator@6876 > sub=Vmsvc.vm:/vmfs/volumes/469adbc4-fd0b07aa/r-7-VM/r-7-VM.vmx] Send config > update invoked > 2016-10-25T05:54:58.492Z info hostd[2A59FB70] [Originator@6876 > sub=Vmsvc.vm:/vmfs/volumes/469adbc4-fd0b07aa/r-7-VM/r-7-VM.vmx] Tools > manifest version status changed from guestToolsUnmanaged to > guestToolsUnmanaged > 2016-10-25T05:54:58.498Z info hostd[2A59FB70] [Originator@6876 > sub=Vmsvc.vm:/vmfs/volumes/469adbc4-fd0b07aa/r-7-VM/r-7-VM.vmx] Send config > update invoked > 2016-10-25T05:54:58.515Z info hostd[2A59FB70] [Originator@6876 > sub=Vmsvc.vm:/vmfs/volumes/469adbc4-fd0b07aa/r-7-VM/r-7-VM.vmx] Setting the > tools properties cache. > 2016-10-25T05:55:05.555Z info hostd[FFA02A70] [Originator@6876 > sub=Vimsvc.TaskManager opID=134074df-7f-06cc > user=vpxuser:VSPHERE.LOCAL\vpxd-extension-57107a00-4014-11e6-85ca-90e2ba4b0016] > Task Created : haTask-ha-host-vim.HostSystem.acquireCimServicesTicket-1943 > 2016-10-25T05:55:05.556Z info hostd[2BBC2B70] [Originator@6876 > sub=Vimsvc.TaskManager opID=134074df-7f-06cc > user=vpxuser:VSPHERE.LOCAL\vpxd-extension-57107a00-4014-11e6-85ca-90e2ba4b0016] > Task Completed : haTask-ha-host-vim.HostSystem.acquireCimServicesTicket-1943 > Status success > 2016-10-25T05:55:05.780Z info hostd[2A59FB70] [Originator@6876 > sub=Vimsvc.TaskManager opID=HB-host-3855@531-609c8d99-79-06d3 user=vpxuser] > Task Created : haTask--vim.OverheadService.downloadVMXConfig-1946 > 2016-10-25T05:55:05.781Z info hostd[2A59FB70] [Originator@6876 > sub=Vimsvc.TaskManager opID=HB-host-3855@531-609c8d99-79-06d3 user=vpxuser] > Task Completed : haTask--vim.OverheadService.downloadVMXConfig-1946 Status > success > 2016-10-25T05:55:28.986Z info hostd[2BBC2B70] [Originator@6876 > sub=Vimsvc.TaskManager opID=358a7cd8-cf-06ea > user=vpxuser:VSPHERE.LOCAL\Administrator] Task Created : > haTask-7-vim.VirtualMachine.shutdownGuest-1957 > 2016-10-25T05:55:28.987Z info hostd[2BBC2B70] [Originator@6876 > sub=Vmsvc.vm:/vmfs/volumes/469adbc4-fd0b07aa/r-7-VM/r-7-VM.vmx > opID=358a7cd8-cf-06ea user=vpxuser:VSPHERE.LOCAL\Administrator] State > Transition (VM_STATE_ON -> VM_STATE_SHUTTING_DOWN) > 2016-10-25T05:55:28.987Z info hostd[2BBC2B70] [Originator@6876 > sub=Vimsvc.ha-eventmgr opID=358a7cd8-cf-06ea > user=vpxuser:VSPHERE.LOCAL\Administrator] Event 496 : Guest OS shut down for > r-7-VM on esxi-2.mvdcdev28.us.alcatel-lucent.com in ha-datacenter > 2016-10-25T05:55:29.202Z info hostd[2BBC2B70] [Originator@6876 > sub=Vmsvc.vm:/vmfs/volumes/469adbc4-fd0b07aa/r-7-VM/r-7-VM.vmx > opID=358a7cd8-cf-06ea user=vpxuser:VSPHERE.LOCAL\Administrator] State > Transition (VM_STATE_SHUTTING_DOWN -> VM_STATE_ON_SHUTTING_DOWN) > 2016-10-25T05:55:29.203Z info hostd[2BBC2B70] [Originator@6876 > sub=Vimsvc.TaskManager opID=358a7cd8-cf-06ea > user=vpxuser:VSPHERE.LOCAL\Administrator] Task Completed : > haTask-7-vim.VirtualMachine.shutdownGuest-1957 Status success > 2016-10-25T05:55:30.163Z info hostd[2BB81B70] [Originator@6876 > sub=Vmsvc.vm:/vmfs/volumes/469adbc4-fd0b07aa/r-7-VM/r-7-VM.vmx] Send config > update invoked > 2016-10-25T05:55:30.182Z info hostd[2BB81B70] [Originator@6876 > sub=Vmsvc.vm:/vmfs/volumes/469adbc4-fd0b07aa/r-7-VM/r-7-VM.vmx] Send config > update invoked > [LikewiseGetDomainJoinInfo:355] QueryInformation(): ERROR_FILE_NOT_FOUND > (2/0): > > 2016-10-25T05:55:48.770Z info hostd[2A59FB70] [Originator@6876 sub=Hostsvc > opID=6ea70743] VsanSystemVmkProvider : GetRuntimeInfo: Complete, runtime > info: (vim.vsan.host.VsanRuntimeInfo) { > --> accessGenNo = 0, > > > --> } > > > 2016-10-25T05:55:54.814Z info hostd[2B981B70] [Originator@6876 > sub=Vmsvc.vm:/vmfs/volumes/469adbc4-fd0b07aa/r-7-VM/r-7-VM.vmx] Send config > update invoked > 2016-10-25T05:55:54.825Z info hostd[2C080B70] [Originator@6876 > sub=Vimsvc.ha-eventmgr] Event 497 : The dvPort 262 link was down in the > vSphere Distributed Switch in ha-datacenter > 2016-10-25T05:55:54.887Z warning hostd[FFA43B70] [Originator@6876 sub=Libs] > VigorOnlineStatusCb: Failed to find RPC Request in pending list for sequence > 0 > 2016-10-25T05:55:54.887Z info hostd[FFA43B70] [Originator@6876 sub=Libs] > SOCKET 5 (44) > > 2016-10-25T05:55:54.887Z info hostd[FFA43B70] [Originator@6876 sub=Libs] recv > detected client closed connection > > 2016-10-25T05:55:54.887Z info hostd[FFA43B70] [Originator@6876 sub=Libs] > VigorTransportClientProcessError: Remote connection failure > > 2016-10-25T05:55:54.887Z info hostd[FFA43B70] [Originator@6876 sub=Libs] > VigorTransportClientDrainRecv: draining read. > > 2016-10-25T05:55:54.887Z info hostd[FFA43B70] [Originator@6876 sub=Libs] > SOCKET 5 (44) > > 2016-10-25T05:55:54.887Z info hostd[FFA43B70] [Originator@6876 sub=Libs] recv > detected client closed connection > > 2016-10-25T05:55:54.887Z info hostd[FFA43B70] [Originator@6876 sub=Libs] > VigorTransportClientProcessError: closing connection. > > 2016-10-25T05:55:54.887Z info hostd[FFA43B70] [Originator@6876 sub=Libs] > VigorTransportClientManageConnection: connection closed. > > 2016-10-25T05:55:54.892Z info hostd[FFA43B70] [Originator@6876 sub=Libs] > CnxAuthdProtoSecureConnect: Unencrypted connection, skipping thumbprint > exchange. > 2016-10-25T05:55:54.900Z info hostd[FFA43B70] [Originator@6876 sub=Libs] > CnxConnectAuthd: Returning false because CnxAuthdProtoConnect failed > > 2016-10-25T05:55:54.900Z info hostd[FFA43B70] [Originator@6876 sub=Libs] > Cnx_Connect: Returning false because CnxConnectAuthd failed > > 2016-10-25T05:55:54.900Z info hostd[FFA43B70] [Originator@6876 > sub=vm:Cnx_Connect: Error message: There is no VMware process running for > config file /vmfs/volumes/469adbc4-fd0b07aa/r-7-VM/r-7-VM.vmx] > 2016-10-25T05:55:54.900Z warning hostd[FFA43B70] [Originator@6876 > sub=vm:VigorTransportClientManageConnection: Failed to re-connect to VM > /vmfs/volumes/469adbc4-fd0b07aa/r-7-VM/r-7-VM.vmx] > 2016-10-25T05:55:54.902Z info hostd[FFA43B70] [Originator@6876 sub=Libs] > VigorOnlineDisconnectCb: connection closed (is final). > > 2016-10-25T05:55:54.920Z info hostd[2B981B70] [Originator@6876 sub=Hostsvc] > Lookupvm: Cartel ID not set for VM 7 > > 2016-10-25T05:55:54.920Z warning hostd[2B981B70] [Originator@6876 > sub=Vmsvc.vm:/vmfs/volumes/469adbc4-fd0b07aa/r-7-VM/r-7-VM.vmx] Unable to get > resource settings for a powered on VM > 2016-10-25T05:55:54.920Z info hostd[2B981B70] [Originator@6876 sub=Hostsvc] > Lookupvm: Cartel ID not set for VM 7 > > 2016-10-25T05:55:54.931Z info hostd[2B981B70] [Originator@6876 > sub=Vmsvc.vm:/vmfs/volumes/469adbc4-fd0b07aa/r-7-VM/r-7-VM.vmx] Send config > update invoked > 2016-10-25T05:55:54.952Z info hostd[2B981B70] [Originator@6876 sub=Hostsvc] > Lookupvm: Cartel ID not set for VM 7 > > 2016-10-25T05:55:54.952Z warning hostd[2B981B70] [Originator@6876 > sub=Vmsvc.vm:/vmfs/volumes/469adbc4-fd0b07aa/r-7-VM/r-7-VM.vmx] Unable to get > resource settings for a powered on VM > 2016-10-25T05:55:54.952Z info hostd[2B981B70] [Originator@6876 sub=Hostsvc] > Lookupvm: Cartel ID not set for VM 7 > > 2016-10-25T05:55:54.962Z info hostd[2B981B70] [Originator@6876 > sub=Vmsvc.vm:/vmfs/volumes/469adbc4-fd0b07aa/r-7-VM/r-7-VM.vmx] State > Transition (VM_STATE_ON_SHUTTING_DOWN -> VM_STATE_OFF) > 2016-10-25T05:55:54.963Z info hostd[2A59FB70] [Originator@6876 > sub=Guestsvc.GuestFileTransferImpl] Entered VmPowerStateListener > > 2016-10-25T05:55:54.963Z info hostd[2A59FB70] [Originator@6876 > sub=Guestsvc.GuestFileTransferImpl] VmPowerStateListener succeeded > > 2016-10-25T05:55:54.963Z info hostd[2A59FB70] [Originator@6876 sub=Hbrsvc] > Replicator: powerstate change VM: 7 Old: 1 New: 0 > > 2016-10-25T05:55:54.963Z info hostd[2A59FB70] [Originator@6876 sub=Hbrsvc] > Replicator: Poweroff for VM: (id=7) > > 2016-10-25T05:55:54.963Z warning hostd[2B981B70] [Originator@6876 > sub=Vmsvc.vm:/vmfs/volumes/469adbc4-fd0b07aa/r-7-VM/r-7-VM.vmx] Failed to > find activation record, event user unknown. > 2016-10-25T05:55:54.964Z info hostd[2B981B70] [Originator@6876 > sub=Vimsvc.ha-eventmgr] Event 498 : r-7-VM on > esxi-2.mvdcdev28.us.alcatel-lucent.com in ha-datacenter is powered off > {noformat} > Workaround: Redo/retry the operation that brings up the VRs (e.g. create > VPCs/networks, restart VPCs/networks with cleanup, etc.). -- This message was sent by Atlassian JIRA (v6.3.15#6346)