On 08/10/2012 04:48 PM, Caleb Call wrote: > I've had this happen before and was unable to recover from it. I eventually > had to just rebuild my box. > > This doc may provide some help (I found it after my incident) > > http://support.citrix.com/servlet/KbServlet/download/17140-102-18520/XenServer%20System%20Recovery%20Guide.pdf > Thanks. I have exhausted every xe command under the sun, and it appears I will have to rebuild the server from scratch. Before I do, do any Cloudstack developers need/want to take a look at my controller or Xenserver? I have no clue why the server would have just disappeared, so if there is any logs that may help, I'll be glad to supply them.
I know I can't have Hypervisors just disappearing in the middle of the night forcing them to be rebuilt each time this happens! Nik > > On Aug 10, 2012, at 11:59 AM, Anthony Xu <xuefei...@citrix.com> wrote: > >> Hi Nik >> >> What's the network configuration in XenServer host? >> Is it bridge or openvswitch? >> >> You can get the info by >> Cat /etc/xensource/network.conf >> >> Anthony >> >>> -----Original Message----- >>> From: Nik Martin [mailto:nik.mar...@nfinausa.com] >>> Sent: Friday, August 10, 2012 8:36 AM >>> To: cloudstack-users@incubator.apache.org >>> Subject: Re: Xen Host failure in pool >>> >>> On 08/10/2012 10:32 AM, Mice Xia wrote: >>>> >>>> I remember when network partition happens, pool slave may enter >>> emergency mode and show offline as it could not reach its master for a >>> long time. >>>> Could you check hv1's console (graphical console, not ssh console), >>> and check if its nics are shown correctly? >>>> >>>> Regards >>>> Mice >>>> >>> No, when I went into the xsconsole and tried to review all the settings, >>> it was not showing the management interfaces properly. >>> >>> -- >>> Regards, >>> >>> Nik >>> >>>> -----Original Message----- >>>> From: Nik Martin [mailto:nik.mar...@nfinausa.com] >>>> Sent: 2012-8-10 (ζζδΊ) 23:04 >>>> To: cloudstack-users@incubator.apache.org >>>> Subject: Xen Host failure in pool >>>> >>>> We have a Xenserver 6.2 based pool of three hosts running under >>>> CloudStack Acton release (code base is about two weeks old). We left >>>> last night and everything was fine, and I have about 2 VMs running on >>>> each host, not doing anything. This morning, I came in, and three VMs >>>> have stopped, and I logged into XenCenter to see what the pool looked >>>> like, and the Pol master hd changed from host HV3 to HV2, and HV1 was >>>> offline. I logged in to HV1's console, and looked at the >>>> /var/log/messages, and it was complaining about the pool master >>> address >>>> being wrong. I went into CloudStack UI and deleted and re-added the >>>> host, and it failed immediately, and I got this in the log when I did: >>>> >>>> >>>> 2012-08-10 09:56:39,566 DEBUG [cloud.api.ApiServlet] >>>> (catalina-exec-24:null) Invalid paramemter in URL found. param: >>> hosttags= >>>> 2012-08-10 09:56:39,573 INFO [cloud.resource.ResourceManagerImpl] >>>> (catalina-exec-24:null) Trying to add a new host at http://172.16.5.3 >>> in >>>> data center 2 >>>> 2012-08-10 09:56:39,629 DEBUG [xen.resource.XenServerConnectionPool] >>>> (catalina-exec-24:null) Slave logon to 172.16.5.3 >>>> 2012-08-10 09:56:39,632 DEBUG [xen.resource.XenServerConnectionPool] >>>> (catalina-exec-24:null) Failed to slave local login to 172.16.5.3 due >>> to >>>> The master says the host is not known to it. Perhaps the Host was >>>> deleted from the master's database? Perhaps the slave is pointing to >>> the >>>> wrong master? >>>> 2012-08-10 09:56:39,638 DEBUG [xen.discoverer.XcpServerDiscoverer] >>>> (catalina-exec-24:null) other exceptions: java.lang.RuntimeException: >>>> can not get master ip >>>> java.lang.RuntimeException: can not get master ip >>>> at >>>> >>> com.cloud.hypervisor.xen.resource.XenServerConnectionPool.getMasterIp(X >>> enServerConnectionPool.java:343) >>>> at >>>> >>> com.cloud.hypervisor.xen.discoverer.XcpServerDiscoverer.find(XcpServerD >>> iscoverer.java:179) >>>> at >>>> >>> com.cloud.resource.ResourceManagerImpl.discoverHostsFull(ResourceManage >>> rImpl.java:644) >>>> at >>>> >>> com.cloud.resource.ResourceManagerImpl.discoverHosts(ResourceManagerImp >>> l.java:514) >>>> at com.cloud.api.commands.AddHostCmd.execute(AddHostCmd.java:136) >>>> at com.cloud.api.ApiDispatcher.dispatch(ApiDispatcher.java:132) >>>> at com.cloud.api.ApiServer.queueCommand(ApiServer.java:509) >>>> at com.cloud.api.ApiServer.handleRequest(ApiServer.java:416) >>>> at com.cloud.api.ApiServlet.processRequest(ApiServlet.java:300) >>>> at com.cloud.api.ApiServlet.doGet(ApiServlet.java:59) >>>> at javax.servlet.http.HttpServlet.service(HttpServlet.java:617) >>>> at javax.servlet.http.HttpServlet.service(HttpServlet.java:717) >>>> at >>>> >>> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Applic >>> ationFilterChain.java:290) >>>> at >>>> >>> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFil >>> terChain.java:206) >>>> at >>>> >>> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperVal >>> ve.java:233) >>>> at >>>> >>> org.apache.catalina.core.StandardContextValve.invoke(StandardContextVal >>> ve.java:191) >>>> at >>>> >>> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.jav >>> a:127) >>>> at >>>> >>> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.jav >>> a:102) >>>> at >>>> >>> org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:55 >>> 5) >>>> at >>>> >>> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve >>> .java:109) >>>> at >>>> >>> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java: >>> 298) >>>> at >>>> >>> org.apache.coyote.http11.Http11NioProcessor.process(Http11NioProcessor. >>> java:889) >>>> at >>>> >>> org.apache.coyote.http11.Http11NioProtocol$Http11ConnectionHandler.proc >>> ess(Http11NioProtocol.java:721) >>>> at >>>> >>> org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.run(NioEndpoint. >>> java:2268) >>>> at >>>> >>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.ja >>> va:1110) >>>> at >>>> >>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.j >>> ava:603) >>>> at java.lang.Thread.run(Thread.java:679) >>>> 2012-08-10 09:56:39,638 WARN [cloud.resource.ResourceManagerImpl] >>>> (catalina-exec-24:null) Unable to find the server resources at >>>> http://172.16.5.3 >>>> 2012-08-10 09:56:39,642 WARN [api.commands.AddHostCmd] >>>> (catalina-exec-24:null) Exception: >>>> com.cloud.exception.DiscoveryException: Unable to add the host >>>> at >>>> >>> com.cloud.resource.ResourceManagerImpl.discoverHostsFull(ResourceManage >>> rImpl.java:694) >>>> at >>>> >>> com.cloud.resource.ResourceManagerImpl.discoverHosts(ResourceManagerImp >>> l.java:514) >>>> at com.cloud.api.commands.AddHostCmd.execute(AddHostCmd.java:136) >>>> at com.cloud.api.ApiDispatcher.dispatch(ApiDispatcher.java:132) >>>> at com.cloud.api.ApiServer.queueCommand(ApiServer.java:509) >>>> at com.cloud.api.ApiServer.handleRequest(ApiServer.java:416) >>>> at com.cloud.api.ApiServlet.processRequest(ApiServlet.java:300) >>>> at com.cloud.api.ApiServlet.doGet(ApiServlet.java:59) >>>> at javax.servlet.http.HttpServlet.service(HttpServlet.java:617) >>>> at javax.servlet.http.HttpServlet.service(HttpServlet.java:717) >>>> at >>>> >>> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Applic >>> ationFilterChain.java:290) >>>> at >>>> >>> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFil >>> terChain.java:206) >>>> at >>>> >>> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperVal >>> ve.java:233) >>>> at >>>> >>> org.apache.catalina.core.StandardContextValve.invoke(StandardContextVal >>> ve.java:191) >>>> at >>>> >>> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.jav >>> a:127) >>>> at >>>> >>> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.jav >>> a:102) >>>> at >>>> >>> org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:55 >>> 5) >>>> at >>>> >>> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve >>> .java:109) >>>> at >>>> >>> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java: >>> 298) >>>> at >>>> >>> org.apache.coyote.http11.Http11NioProcessor.process(Http11NioProcessor. >>> java:889) >>>> at >>>> >>> org.apache.coyote.http11.Http11NioProtocol$Http11ConnectionHandler.proc >>> ess(Http11NioProtocol.java:721) >>>> at >>>> >>> org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.run(NioEndpoint. >>> java:2268) >>>> at >>>> >>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.ja >>> va:1110) >>>> at >>>> >>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.j >>> ava:603) >>>> at java.lang.Thread.run(Thread.java:679) >>>> 2012-08-10 09:56:39,642 WARN [cloud.api.ApiDispatcher] >>>> (catalina-exec-24:null) class com.cloud.api.ServerApiException : >>> Unable >>>> to add the host >>>> 2012-08-10 09:56:39,723 DEBUG [agent.manager.DirectAgentAttache] >>>> (DirectAgent-305:null) Ping from 17 >>>> 2012-08-10 09:56:43,822 DEBUG >>>> [storage.secondary.SecondaryStorageManagerImpl] (secstorage-1:null) >>> Zone >>>> 2 is ready to launch secondary storage VM >>>> 2012-08-10 09:56:43,916 DEBUG >>>> [cloud.consoleproxy.ConsoleProxyManagerImpl] (consoleproxy-1:null) >>> Zone >>>> 2 is ready to launch console proxy >>>> 2012-08-10 09:56:44,102 DEBUG >>>> [network.router.VirtualNetworkApplianceManagerImpl] >>>> (RouterStatusMonitor-1:null) Found 2 routers. >>>> 2012-08-10 09:56:44,614 DEBUG [agent.manager.AgentManagerImpl] >>>> (AgentManager-Handler-12:null) Ping from 22 >>>> 2012-08-10 09:56:48,864 DEBUG [agent.manager.AgentManagerImpl] >>>> (AgentManager-Handler-10:null) Ping from 18 >>>> 2012-08-10 09:56:49,511 DEBUG [cloud.server.StatsCollector] >>>> (StatsCollector-1:null) VmStatsCollector is running... >>>> 2012-08-10 09:56:49,525 DEBUG [agent.manager.DirectAgentAttache] >>>> (DirectAgent-305:null) Seq 16-92408948: Executing request >>>> 2012-08-10 09:56:49,763 DEBUG [xen.resource.CitrixResourceBase] >>>> (DirectAgent-305:null) Vm cpu utilization 0.01 >>>> 2012-08-10 09:56:49,763 DEBUG [agent.manager.DirectAgentAttache] >>>> (DirectAgent-305:null) Seq 16-92408948: Response Received: >>>> 2012-08-10 09:56:49,763 DEBUG [cloud.vm.VirtualMachineManagerImpl] >>>> (DirectAgent-305:null) Cleanup succeeded. Details null >>>> 2012-08-10 09:56:49,763 DEBUG [agent.transport.Request] >>>> (StatsCollector-1:null) Seq 16-92408948: Received: { Ans: , MgmtId: >>>> 130577622632, via: 16, Ver: v1, Flags: 10, { GetVmStatsAnswer } } >>>> 2012-08-10 09:56:49,763 DEBUG [cloud.vm.VirtualMachineManagerImpl] >>>> (StatsCollector-1:null) Cleanup succeeded. Details null >>>> 2012-08-10 09:56:54,411 DEBUG [agent.manager.DirectAgentAttache] >>>> (DirectAgent-497:null) Ping from 17 >>>> 2012-08-10 09:56:54,550 DEBUG [agent.manager.DirectAgentAttache] >>>> (DirectAgent-338:null) Ping from 16 >>>> 2012-08-10 09:56:59,614 DEBUG [agent.manager.AgentManagerImpl] >>>> (AgentManager-Handler-8:null) Ping from 22 >>>> 2012-08-10 09:57:03,864 DEBUG [agent.manager.AgentManagerImpl] >>>> (AgentManager-Handler-9:null) Ping from 18 >>>> 2012-08-10 09:57:09,551 DEBUG [agent.manager.DirectAgentAttache] >>>> (DirectAgent-71:null) Ping from 16 >>>> 2012-08-10 09:57:09,669 DEBUG [agent.manager.DirectAgentAttache] >>>> (DirectAgent-338:null) Ping from 17 >>>> 2012-08-10 09:57:13,821 DEBUG >>>> [storage.secondary.SecondaryStorageManagerImpl] (secstorage-1:null) >>> Zone >>>> 2 is ready to launch secondary storage VM >>>> 2012-08-10 09:57:13,918 DEBUG >>>> [cloud.consoleproxy.ConsoleProxyManagerImpl] (consoleproxy-1:null) >>> Zone >>>> 2 is ready to launch console proxy >>>> 2012-08-10 09:57:14,102 DEBUG >>>> [network.router.VirtualNetworkApplianceManagerImpl] >>>> (RouterStatusMonitor-1:null) Found 2 routers. >>>> 2012-08-10 09:57:14,614 DEBUG [agent.manager.AgentManagerImpl] >>>> (AgentManager-Handler-11:null) Ping from 22 >>>> 2012-08-10 09:57:15,645 DEBUG [cloud.server.StatsCollector] >>>> (StatsCollector-3:null) HostStatsCollector is running... >>>> 2012-08-10 09:57:15,656 DEBUG [agent.manager.DirectAgentAttache] >>>> (DirectAgent-71:null) Seq 16-92408949: Executing request >>>> 2012-08-10 09:57:15,878 DEBUG [agent.manager.DirectAgentAttache] >>>> (DirectAgent-71:null) Seq 16-92408949: Response Received: >>>> 2012-08-10 09:57:15,878 DEBUG [cloud.vm.VirtualMachineManagerImpl] >>>> (DirectAgent-71:null) Cleanup succeeded. Details null >>>> 2012-08-10 09:57:15,878 DEBUG [agent.transport.Request] >>>> (StatsCollector-3:null) Seq 16-92408949: Received: { Ans: , MgmtId: >>>> 130577622632, via: 16, Ver: v1, Flags: 10, { GetHostStatsAnswer } } >>>> 2012-08-10 09:57:15,879 DEBUG [cloud.vm.VirtualMachineManagerImpl] >>>> (StatsCollector-3:null) Cleanup succeeded. Details null >>>> 2012-08-10 09:57:15,884 DEBUG [agent.manager.DirectAgentAttache] >>>> (DirectAgent-338:null) Seq 17-665190891: Executing request >>>> 2012-08-10 09:57:16,312 DEBUG [agent.manager.DirectAgentAttache] >>>> (DirectAgent-338:null) Seq 17-665190891: Response Received: >>>> 2012-08-10 09:57:16,312 DEBUG [cloud.vm.VirtualMachineManagerImpl] >>>> (DirectAgent-338:null) Cleanup succeeded. Details null >>>> 2012-08-10 09:57:16,312 DEBUG [agent.transport.Request] >>>> (StatsCollector-3:null) Seq 17-665190891: Received: { Ans: , MgmtId: >>>> 130577622632, via: 17, Ver: v1, Flags: 10, { GetHostStatsAnswer } } >>>> 2012-08-10 09:57:16,313 DEBUG [cloud.vm.VirtualMachineManagerImpl] >>>> (StatsCollector-3:null) Cleanup succeeded. Details null >>>> 2012-08-10 09:57:18,864 DEBUG [agent.manager.AgentManagerImpl] >>>> (AgentManager-Handler-15:null) Ping from 18 >>>> 2012-08-10 09:57:24,407 DEBUG [agent.manager.DirectAgentAttache] >>>> (DirectAgent-71:null) Ping from 17 >>>> 2012-08-10 09:57:24,566 DEBUG [agent.manager.DirectAgentAttache] >>>> (DirectAgent-338:null) Ping from 16 >>>> 2012-08-10 09:57:29,615 DEBUG [agent.manager.AgentManagerImpl] >>>> (AgentManager-Handler-1:null) Ping from 22 >>>> 2012-08-10 09:57:30,047 DEBUG [agent.manager.DirectAgentAttache] >>>> (DirectAgent-294:null) Seq 16-92405762: Executing request >>>> 2012-08-10 09:57:30,308 DEBUG [agent.manager.DirectAgentAttache] >>>> (DirectAgent-294:null) Seq 16-92405762: Response Received: >>>> 2012-08-10 09:57:30,308 DEBUG [agent.transport.Request] >>>> (DirectAgent-294:null) Seq 16-92405762: Processing: { Ans: , MgmtId: >>>> 130577622632, via: 16, Ver: v1, Flags: 10, >>>> >>> [{"ClusterSyncAnswer":{"_clusterId":1,"_newStates":{},"_isExecuted":fal >>> se,"result":true,"wait":0}}] >>>> } >>>> 2012-08-10 09:57:31,060 DEBUG [agent.manager.DirectAgentAttache] >>>> (DirectAgent-357:null) Seq 17-665190402: Executing request >>>> 2012-08-10 09:57:31,250 DEBUG [agent.manager.DirectAgentAttache] >>>> (DirectAgent-357:null) Seq 17-665190402: Response Received: >>>> 2012-08-10 09:57:31,250 DEBUG [agent.transport.Request] >>>> (DirectAgent-357:null) Seq 17-665190402: Processing: { Ans: , MgmtId: >>>> 130577622632, via: 17, Ver: v1, Flags: 10, >>>> [{"Answer":{"result":true,"wait":0}}] } >>>> >>>> This is a very serious error, and I don't know how to fix it. Can >>>> anyone suggest what might be the problem and hos I might fix it? >>>> >>>> >>> >>> >> > -- Regards, Nik