Did you see something like nics are lost, or cannot be found? If yes, try xe pool-emergency-reset-master and reboot
Im not in the office, not sure if this is the correct cmd, I remember there are similar ones but only one works. Please google xenserver nics disappear or nics lost if it is does not work Hope this help Regards Mice -----Original Message----- From: Nik Martin [mailto:nik.mar...@nfinausa.com] Sent: 2012-8-10 (星期五) 23:35 To: cloudstack-users@incubator.apache.org Subject: Re: Xen Host failure in pool On 08/10/2012 10:32 AM, Mice Xia wrote: > > I remember when network partition happens, pool slave may enter emergency > mode and show offline as it could not reach its master for a long time. > Could you check hv1's console (graphical console, not ssh console), and check > if its nics are shown correctly? > > Regards > Mice > No, when I went into the xsconsole and tried to review all the settings, it was not showing the management interfaces properly. -- Regards, Nik > -----Original Message----- > From: Nik Martin [mailto:nik.mar...@nfinausa.com] > Sent: 2012-8-10 (星期五) 23:04 > To: cloudstack-users@incubator.apache.org > Subject: Xen Host failure in pool > > We have a Xenserver 6.2 based pool of three hosts running under > CloudStack Acton release (code base is about two weeks old). We left > last night and everything was fine, and I have about 2 VMs running on > each host, not doing anything. This morning, I came in, and three VMs > have stopped, and I logged into XenCenter to see what the pool looked > like, and the Pol master hd changed from host HV3 to HV2, and HV1 was > offline. I logged in to HV1's console, and looked at the > /var/log/messages, and it was complaining about the pool master address > being wrong. I went into CloudStack UI and deleted and re-added the > host, and it failed immediately, and I got this in the log when I did: > > > 2012-08-10 09:56:39,566 DEBUG [cloud.api.ApiServlet] > (catalina-exec-24:null) Invalid paramemter in URL found. param: hosttags= > 2012-08-10 09:56:39,573 INFO [cloud.resource.ResourceManagerImpl] > (catalina-exec-24:null) Trying to add a new host at http://172.16.5.3 in > data center 2 > 2012-08-10 09:56:39,629 DEBUG [xen.resource.XenServerConnectionPool] > (catalina-exec-24:null) Slave logon to 172.16.5.3 > 2012-08-10 09:56:39,632 DEBUG [xen.resource.XenServerConnectionPool] > (catalina-exec-24:null) Failed to slave local login to 172.16.5.3 due to > The master says the host is not known to it. Perhaps the Host was > deleted from the master's database? Perhaps the slave is pointing to the > wrong master? > 2012-08-10 09:56:39,638 DEBUG [xen.discoverer.XcpServerDiscoverer] > (catalina-exec-24:null) other exceptions: java.lang.RuntimeException: > can not get master ip > java.lang.RuntimeException: can not get master ip > at > com.cloud.hypervisor.xen.resource.XenServerConnectionPool.getMasterIp(XenServerConnectionPool.java:343) > at > com.cloud.hypervisor.xen.discoverer.XcpServerDiscoverer.find(XcpServerDiscoverer.java:179) > at > com.cloud.resource.ResourceManagerImpl.discoverHostsFull(ResourceManagerImpl.java:644) > at > com.cloud.resource.ResourceManagerImpl.discoverHosts(ResourceManagerImpl.java:514) > at com.cloud.api.commands.AddHostCmd.execute(AddHostCmd.java:136) > at com.cloud.api.ApiDispatcher.dispatch(ApiDispatcher.java:132) > at com.cloud.api.ApiServer.queueCommand(ApiServer.java:509) > at com.cloud.api.ApiServer.handleRequest(ApiServer.java:416) > at com.cloud.api.ApiServlet.processRequest(ApiServlet.java:300) > at com.cloud.api.ApiServlet.doGet(ApiServlet.java:59) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:617) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:717) > at > org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:290) > at > org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) > at > org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) > at > org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) > at > org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) > at > org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) > at > org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:555) > at > org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) > at > org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298) > at > org.apache.coyote.http11.Http11NioProcessor.process(Http11NioProcessor.java:889) > at > org.apache.coyote.http11.Http11NioProtocol$Http11ConnectionHandler.process(Http11NioProtocol.java:721) > at > org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.run(NioEndpoint.java:2268) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) > at java.lang.Thread.run(Thread.java:679) > 2012-08-10 09:56:39,638 WARN [cloud.resource.ResourceManagerImpl] > (catalina-exec-24:null) Unable to find the server resources at > http://172.16.5.3 > 2012-08-10 09:56:39,642 WARN [api.commands.AddHostCmd] > (catalina-exec-24:null) Exception: > com.cloud.exception.DiscoveryException: Unable to add the host > at > com.cloud.resource.ResourceManagerImpl.discoverHostsFull(ResourceManagerImpl.java:694) > at > com.cloud.resource.ResourceManagerImpl.discoverHosts(ResourceManagerImpl.java:514) > at com.cloud.api.commands.AddHostCmd.execute(AddHostCmd.java:136) > at com.cloud.api.ApiDispatcher.dispatch(ApiDispatcher.java:132) > at com.cloud.api.ApiServer.queueCommand(ApiServer.java:509) > at com.cloud.api.ApiServer.handleRequest(ApiServer.java:416) > at com.cloud.api.ApiServlet.processRequest(ApiServlet.java:300) > at com.cloud.api.ApiServlet.doGet(ApiServlet.java:59) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:617) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:717) > at > org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:290) > at > org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) > at > org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) > at > org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) > at > org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) > at > org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) > at > org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:555) > at > org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) > at > org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298) > at > org.apache.coyote.http11.Http11NioProcessor.process(Http11NioProcessor.java:889) > at > org.apache.coyote.http11.Http11NioProtocol$Http11ConnectionHandler.process(Http11NioProtocol.java:721) > at > org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.run(NioEndpoint.java:2268) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) > at java.lang.Thread.run(Thread.java:679) > 2012-08-10 09:56:39,642 WARN [cloud.api.ApiDispatcher] > (catalina-exec-24:null) class com.cloud.api.ServerApiException : Unable > to add the host > 2012-08-10 09:56:39,723 DEBUG [agent.manager.DirectAgentAttache] > (DirectAgent-305:null) Ping from 17 > 2012-08-10 09:56:43,822 DEBUG > [storage.secondary.SecondaryStorageManagerImpl] (secstorage-1:null) Zone > 2 is ready to launch secondary storage VM > 2012-08-10 09:56:43,916 DEBUG > [cloud.consoleproxy.ConsoleProxyManagerImpl] (consoleproxy-1:null) Zone > 2 is ready to launch console proxy > 2012-08-10 09:56:44,102 DEBUG > [network.router.VirtualNetworkApplianceManagerImpl] > (RouterStatusMonitor-1:null) Found 2 routers. > 2012-08-10 09:56:44,614 DEBUG [agent.manager.AgentManagerImpl] > (AgentManager-Handler-12:null) Ping from 22 > 2012-08-10 09:56:48,864 DEBUG [agent.manager.AgentManagerImpl] > (AgentManager-Handler-10:null) Ping from 18 > 2012-08-10 09:56:49,511 DEBUG [cloud.server.StatsCollector] > (StatsCollector-1:null) VmStatsCollector is running... > 2012-08-10 09:56:49,525 DEBUG [agent.manager.DirectAgentAttache] > (DirectAgent-305:null) Seq 16-92408948: Executing request > 2012-08-10 09:56:49,763 DEBUG [xen.resource.CitrixResourceBase] > (DirectAgent-305:null) Vm cpu utilization 0.01 > 2012-08-10 09:56:49,763 DEBUG [agent.manager.DirectAgentAttache] > (DirectAgent-305:null) Seq 16-92408948: Response Received: > 2012-08-10 09:56:49,763 DEBUG [cloud.vm.VirtualMachineManagerImpl] > (DirectAgent-305:null) Cleanup succeeded. Details null > 2012-08-10 09:56:49,763 DEBUG [agent.transport.Request] > (StatsCollector-1:null) Seq 16-92408948: Received: { Ans: , MgmtId: > 130577622632, via: 16, Ver: v1, Flags: 10, { GetVmStatsAnswer } } > 2012-08-10 09:56:49,763 DEBUG [cloud.vm.VirtualMachineManagerImpl] > (StatsCollector-1:null) Cleanup succeeded. Details null > 2012-08-10 09:56:54,411 DEBUG [agent.manager.DirectAgentAttache] > (DirectAgent-497:null) Ping from 17 > 2012-08-10 09:56:54,550 DEBUG [agent.manager.DirectAgentAttache] > (DirectAgent-338:null) Ping from 16 > 2012-08-10 09:56:59,614 DEBUG [agent.manager.AgentManagerImpl] > (AgentManager-Handler-8:null) Ping from 22 > 2012-08-10 09:57:03,864 DEBUG [agent.manager.AgentManagerImpl] > (AgentManager-Handler-9:null) Ping from 18 > 2012-08-10 09:57:09,551 DEBUG [agent.manager.DirectAgentAttache] > (DirectAgent-71:null) Ping from 16 > 2012-08-10 09:57:09,669 DEBUG [agent.manager.DirectAgentAttache] > (DirectAgent-338:null) Ping from 17 > 2012-08-10 09:57:13,821 DEBUG > [storage.secondary.SecondaryStorageManagerImpl] (secstorage-1:null) Zone > 2 is ready to launch secondary storage VM > 2012-08-10 09:57:13,918 DEBUG > [cloud.consoleproxy.ConsoleProxyManagerImpl] (consoleproxy-1:null) Zone > 2 is ready to launch console proxy > 2012-08-10 09:57:14,102 DEBUG > [network.router.VirtualNetworkApplianceManagerImpl] > (RouterStatusMonitor-1:null) Found 2 routers. > 2012-08-10 09:57:14,614 DEBUG [agent.manager.AgentManagerImpl] > (AgentManager-Handler-11:null) Ping from 22 > 2012-08-10 09:57:15,645 DEBUG [cloud.server.StatsCollector] > (StatsCollector-3:null) HostStatsCollector is running... > 2012-08-10 09:57:15,656 DEBUG [agent.manager.DirectAgentAttache] > (DirectAgent-71:null) Seq 16-92408949: Executing request > 2012-08-10 09:57:15,878 DEBUG [agent.manager.DirectAgentAttache] > (DirectAgent-71:null) Seq 16-92408949: Response Received: > 2012-08-10 09:57:15,878 DEBUG [cloud.vm.VirtualMachineManagerImpl] > (DirectAgent-71:null) Cleanup succeeded. Details null > 2012-08-10 09:57:15,878 DEBUG [agent.transport.Request] > (StatsCollector-3:null) Seq 16-92408949: Received: { Ans: , MgmtId: > 130577622632, via: 16, Ver: v1, Flags: 10, { GetHostStatsAnswer } } > 2012-08-10 09:57:15,879 DEBUG [cloud.vm.VirtualMachineManagerImpl] > (StatsCollector-3:null) Cleanup succeeded. Details null > 2012-08-10 09:57:15,884 DEBUG [agent.manager.DirectAgentAttache] > (DirectAgent-338:null) Seq 17-665190891: Executing request > 2012-08-10 09:57:16,312 DEBUG [agent.manager.DirectAgentAttache] > (DirectAgent-338:null) Seq 17-665190891: Response Received: > 2012-08-10 09:57:16,312 DEBUG [cloud.vm.VirtualMachineManagerImpl] > (DirectAgent-338:null) Cleanup succeeded. Details null > 2012-08-10 09:57:16,312 DEBUG [agent.transport.Request] > (StatsCollector-3:null) Seq 17-665190891: Received: { Ans: , MgmtId: > 130577622632, via: 17, Ver: v1, Flags: 10, { GetHostStatsAnswer } } > 2012-08-10 09:57:16,313 DEBUG [cloud.vm.VirtualMachineManagerImpl] > (StatsCollector-3:null) Cleanup succeeded. Details null > 2012-08-10 09:57:18,864 DEBUG [agent.manager.AgentManagerImpl] > (AgentManager-Handler-15:null) Ping from 18 > 2012-08-10 09:57:24,407 DEBUG [agent.manager.DirectAgentAttache] > (DirectAgent-71:null) Ping from 17 > 2012-08-10 09:57:24,566 DEBUG [agent.manager.DirectAgentAttache] > (DirectAgent-338:null) Ping from 16 > 2012-08-10 09:57:29,615 DEBUG [agent.manager.AgentManagerImpl] > (AgentManager-Handler-1:null) Ping from 22 > 2012-08-10 09:57:30,047 DEBUG [agent.manager.DirectAgentAttache] > (DirectAgent-294:null) Seq 16-92405762: Executing request > 2012-08-10 09:57:30,308 DEBUG [agent.manager.DirectAgentAttache] > (DirectAgent-294:null) Seq 16-92405762: Response Received: > 2012-08-10 09:57:30,308 DEBUG [agent.transport.Request] > (DirectAgent-294:null) Seq 16-92405762: Processing: { Ans: , MgmtId: > 130577622632, via: 16, Ver: v1, Flags: 10, > [{"ClusterSyncAnswer":{"_clusterId":1,"_newStates":{},"_isExecuted":false,"result":true,"wait":0}}] > } > 2012-08-10 09:57:31,060 DEBUG [agent.manager.DirectAgentAttache] > (DirectAgent-357:null) Seq 17-665190402: Executing request > 2012-08-10 09:57:31,250 DEBUG [agent.manager.DirectAgentAttache] > (DirectAgent-357:null) Seq 17-665190402: Response Received: > 2012-08-10 09:57:31,250 DEBUG [agent.transport.Request] > (DirectAgent-357:null) Seq 17-665190402: Processing: { Ans: , MgmtId: > 130577622632, via: 17, Ver: v1, Flags: 10, > [{"Answer":{"result":true,"wait":0}}] } > > This is a very serious error, and I don't know how to fix it. Can > anyone suggest what might be the problem and hos I might fix it? > >