On 08/10/2012 12:59 PM, Anthony Xu wrote: > Hi Nik > > What's the network configuration in XenServer host? > Is it bridge or openvswitch? > Anthony,
I'm using cloudstack advanced networking, so the network is openvswitch. I did the emergency network reset, and the Xenserver rebooted, and when it came back, it was the same, no nics being reported. -- Regards, Nik > You can get the info by > Cat /etc/xensource/network.conf > > Anthony > >> -----Original Message----- >> From: Nik Martin [mailto:nik.mar...@nfinausa.com] >> Sent: Friday, August 10, 2012 8:36 AM >> To: cloudstack-users@incubator.apache.org >> Subject: Re: Xen Host failure in pool >> >> On 08/10/2012 10:32 AM, Mice Xia wrote: >>> >>> I remember when network partition happens, pool slave may enter >> emergency mode and show offline as it could not reach its master for a >> long time. >>> Could you check hv1's console (graphical console, not ssh console), >> and check if its nics are shown correctly? >>> >>> Regards >>> Mice >>> >> No, when I went into the xsconsole and tried to review all the settings, >> it was not showing the management interfaces properly. >> >> -- >> Regards, >> >> Nik >> >>> -----Original Message----- >>> From: Nik Martin [mailto:nik.mar...@nfinausa.com] >>> Sent: 2012-8-10 (ζζδΊ) 23:04 >>> To: cloudstack-users@incubator.apache.org >>> Subject: Xen Host failure in pool >>> >>> We have a Xenserver 6.2 based pool of three hosts running under >>> CloudStack Acton release (code base is about two weeks old). We left >>> last night and everything was fine, and I have about 2 VMs running on >>> each host, not doing anything. This morning, I came in, and three VMs >>> have stopped, and I logged into XenCenter to see what the pool looked >>> like, and the Pol master hd changed from host HV3 to HV2, and HV1 was >>> offline. I logged in to HV1's console, and looked at the >>> /var/log/messages, and it was complaining about the pool master >> address >>> being wrong. I went into CloudStack UI and deleted and re-added the >>> host, and it failed immediately, and I got this in the log when I did: >>> >>> >>> 2012-08-10 09:56:39,566 DEBUG [cloud.api.ApiServlet] >>> (catalina-exec-24:null) Invalid paramemter in URL found. param: >> hosttags= >>> 2012-08-10 09:56:39,573 INFO [cloud.resource.ResourceManagerImpl] >>> (catalina-exec-24:null) Trying to add a new host at http://172.16.5.3 >> in >>> data center 2 >>> 2012-08-10 09:56:39,629 DEBUG [xen.resource.XenServerConnectionPool] >>> (catalina-exec-24:null) Slave logon to 172.16.5.3 >>> 2012-08-10 09:56:39,632 DEBUG [xen.resource.XenServerConnectionPool] >>> (catalina-exec-24:null) Failed to slave local login to 172.16.5.3 due >> to >>> The master says the host is not known to it. Perhaps the Host was >>> deleted from the master's database? Perhaps the slave is pointing to >> the >>> wrong master? >>> 2012-08-10 09:56:39,638 DEBUG [xen.discoverer.XcpServerDiscoverer] >>> (catalina-exec-24:null) other exceptions: java.lang.RuntimeException: >>> can not get master ip >>> java.lang.RuntimeException: can not get master ip >>> at >>> >> com.cloud.hypervisor.xen.resource.XenServerConnectionPool.getMasterIp(X >> enServerConnectionPool.java:343) >>> at >>> >> com.cloud.hypervisor.xen.discoverer.XcpServerDiscoverer.find(XcpServerD >> iscoverer.java:179) >>> at >>> >> com.cloud.resource.ResourceManagerImpl.discoverHostsFull(ResourceManage >> rImpl.java:644) >>> at >>> >> com.cloud.resource.ResourceManagerImpl.discoverHosts(ResourceManagerImp >> l.java:514) >>> at com.cloud.api.commands.AddHostCmd.execute(AddHostCmd.java:136) >>> at com.cloud.api.ApiDispatcher.dispatch(ApiDispatcher.java:132) >>> at com.cloud.api.ApiServer.queueCommand(ApiServer.java:509) >>> at com.cloud.api.ApiServer.handleRequest(ApiServer.java:416) >>> at com.cloud.api.ApiServlet.processRequest(ApiServlet.java:300) >>> at com.cloud.api.ApiServlet.doGet(ApiServlet.java:59) >>> at javax.servlet.http.HttpServlet.service(HttpServlet.java:617) >>> at javax.servlet.http.HttpServlet.service(HttpServlet.java:717) >>> at >>> >> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Applic >> ationFilterChain.java:290) >>> at >>> >> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFil >> terChain.java:206) >>> at >>> >> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperVal >> ve.java:233) >>> at >>> >> org.apache.catalina.core.StandardContextValve.invoke(StandardContextVal >> ve.java:191) >>> at >>> >> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.jav >> a:127) >>> at >>> >> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.jav >> a:102) >>> at >>> >> org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:55 >> 5) >>> at >>> >> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve >> .java:109) >>> at >>> >> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java: >> 298) >>> at >>> >> org.apache.coyote.http11.Http11NioProcessor.process(Http11NioProcessor. >> java:889) >>> at >>> >> org.apache.coyote.http11.Http11NioProtocol$Http11ConnectionHandler.proc >> ess(Http11NioProtocol.java:721) >>> at >>> >> org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.run(NioEndpoint. >> java:2268) >>> at >>> >> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.ja >> va:1110) >>> at >>> >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.j >> ava:603) >>> at java.lang.Thread.run(Thread.java:679) >>> 2012-08-10 09:56:39,638 WARN [cloud.resource.ResourceManagerImpl] >>> (catalina-exec-24:null) Unable to find the server resources at >>> http://172.16.5.3 >>> 2012-08-10 09:56:39,642 WARN [api.commands.AddHostCmd] >>> (catalina-exec-24:null) Exception: >>> com.cloud.exception.DiscoveryException: Unable to add the host >>> at >>> >> com.cloud.resource.ResourceManagerImpl.discoverHostsFull(ResourceManage >> rImpl.java:694) >>> at >>> >> com.cloud.resource.ResourceManagerImpl.discoverHosts(ResourceManagerImp >> l.java:514) >>> at com.cloud.api.commands.AddHostCmd.execute(AddHostCmd.java:136) >>> at com.cloud.api.ApiDispatcher.dispatch(ApiDispatcher.java:132) >>> at com.cloud.api.ApiServer.queueCommand(ApiServer.java:509) >>> at com.cloud.api.ApiServer.handleRequest(ApiServer.java:416) >>> at com.cloud.api.ApiServlet.processRequest(ApiServlet.java:300) >>> at com.cloud.api.ApiServlet.doGet(ApiServlet.java:59) >>> at javax.servlet.http.HttpServlet.service(HttpServlet.java:617) >>> at javax.servlet.http.HttpServlet.service(HttpServlet.java:717) >>> at >>> >> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Applic >> ationFilterChain.java:290) >>> at >>> >> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFil >> terChain.java:206) >>> at >>> >> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperVal >> ve.java:233) >>> at >>> >> org.apache.catalina.core.StandardContextValve.invoke(StandardContextVal >> ve.java:191) >>> at >>> >> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.jav >> a:127) >>> at >>> >> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.jav >> a:102) >>> at >>> >> org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:55 >> 5) >>> at >>> >> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve >> .java:109) >>> at >>> >> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java: >> 298) >>> at >>> >> org.apache.coyote.http11.Http11NioProcessor.process(Http11NioProcessor. >> java:889) >>> at >>> >> org.apache.coyote.http11.Http11NioProtocol$Http11ConnectionHandler.proc >> ess(Http11NioProtocol.java:721) >>> at >>> >> org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.run(NioEndpoint. >> java:2268) >>> at >>> >> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.ja >> va:1110) >>> at >>> >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.j >> ava:603) >>> at java.lang.Thread.run(Thread.java:679) >>> 2012-08-10 09:56:39,642 WARN [cloud.api.ApiDispatcher] >>> (catalina-exec-24:null) class com.cloud.api.ServerApiException : >> Unable >>> to add the host >>> 2012-08-10 09:56:39,723 DEBUG [agent.manager.DirectAgentAttache] >>> (DirectAgent-305:null) Ping from 17 >>> 2012-08-10 09:56:43,822 DEBUG >>> [storage.secondary.SecondaryStorageManagerImpl] (secstorage-1:null) >> Zone >>> 2 is ready to launch secondary storage VM >>> 2012-08-10 09:56:43,916 DEBUG >>> [cloud.consoleproxy.ConsoleProxyManagerImpl] (consoleproxy-1:null) >> Zone >>> 2 is ready to launch console proxy >>> 2012-08-10 09:56:44,102 DEBUG >>> [network.router.VirtualNetworkApplianceManagerImpl] >>> (RouterStatusMonitor-1:null) Found 2 routers. >>> 2012-08-10 09:56:44,614 DEBUG [agent.manager.AgentManagerImpl] >>> (AgentManager-Handler-12:null) Ping from 22 >>> 2012-08-10 09:56:48,864 DEBUG [agent.manager.AgentManagerImpl] >>> (AgentManager-Handler-10:null) Ping from 18 >>> 2012-08-10 09:56:49,511 DEBUG [cloud.server.StatsCollector] >>> (StatsCollector-1:null) VmStatsCollector is running... >>> 2012-08-10 09:56:49,525 DEBUG [agent.manager.DirectAgentAttache] >>> (DirectAgent-305:null) Seq 16-92408948: Executing request >>> 2012-08-10 09:56:49,763 DEBUG [xen.resource.CitrixResourceBase] >>> (DirectAgent-305:null) Vm cpu utilization 0.01 >>> 2012-08-10 09:56:49,763 DEBUG [agent.manager.DirectAgentAttache] >>> (DirectAgent-305:null) Seq 16-92408948: Response Received: >>> 2012-08-10 09:56:49,763 DEBUG [cloud.vm.VirtualMachineManagerImpl] >>> (DirectAgent-305:null) Cleanup succeeded. Details null >>> 2012-08-10 09:56:49,763 DEBUG [agent.transport.Request] >>> (StatsCollector-1:null) Seq 16-92408948: Received: { Ans: , MgmtId: >>> 130577622632, via: 16, Ver: v1, Flags: 10, { GetVmStatsAnswer } } >>> 2012-08-10 09:56:49,763 DEBUG [cloud.vm.VirtualMachineManagerImpl] >>> (StatsCollector-1:null) Cleanup succeeded. Details null >>> 2012-08-10 09:56:54,411 DEBUG [agent.manager.DirectAgentAttache] >>> (DirectAgent-497:null) Ping from 17 >>> 2012-08-10 09:56:54,550 DEBUG [agent.manager.DirectAgentAttache] >>> (DirectAgent-338:null) Ping from 16 >>> 2012-08-10 09:56:59,614 DEBUG [agent.manager.AgentManagerImpl] >>> (AgentManager-Handler-8:null) Ping from 22 >>> 2012-08-10 09:57:03,864 DEBUG [agent.manager.AgentManagerImpl] >>> (AgentManager-Handler-9:null) Ping from 18 >>> 2012-08-10 09:57:09,551 DEBUG [agent.manager.DirectAgentAttache] >>> (DirectAgent-71:null) Ping from 16 >>> 2012-08-10 09:57:09,669 DEBUG [agent.manager.DirectAgentAttache] >>> (DirectAgent-338:null) Ping from 17 >>> 2012-08-10 09:57:13,821 DEBUG >>> [storage.secondary.SecondaryStorageManagerImpl] (secstorage-1:null) >> Zone >>> 2 is ready to launch secondary storage VM >>> 2012-08-10 09:57:13,918 DEBUG >>> [cloud.consoleproxy.ConsoleProxyManagerImpl] (consoleproxy-1:null) >> Zone >>> 2 is ready to launch console proxy >>> 2012-08-10 09:57:14,102 DEBUG >>> [network.router.VirtualNetworkApplianceManagerImpl] >>> (RouterStatusMonitor-1:null) Found 2 routers. >>> 2012-08-10 09:57:14,614 DEBUG [agent.manager.AgentManagerImpl] >>> (AgentManager-Handler-11:null) Ping from 22 >>> 2012-08-10 09:57:15,645 DEBUG [cloud.server.StatsCollector] >>> (StatsCollector-3:null) HostStatsCollector is running... >>> 2012-08-10 09:57:15,656 DEBUG [agent.manager.DirectAgentAttache] >>> (DirectAgent-71:null) Seq 16-92408949: Executing request >>> 2012-08-10 09:57:15,878 DEBUG [agent.manager.DirectAgentAttache] >>> (DirectAgent-71:null) Seq 16-92408949: Response Received: >>> 2012-08-10 09:57:15,878 DEBUG [cloud.vm.VirtualMachineManagerImpl] >>> (DirectAgent-71:null) Cleanup succeeded. Details null >>> 2012-08-10 09:57:15,878 DEBUG [agent.transport.Request] >>> (StatsCollector-3:null) Seq 16-92408949: Received: { Ans: , MgmtId: >>> 130577622632, via: 16, Ver: v1, Flags: 10, { GetHostStatsAnswer } } >>> 2012-08-10 09:57:15,879 DEBUG [cloud.vm.VirtualMachineManagerImpl] >>> (StatsCollector-3:null) Cleanup succeeded. Details null >>> 2012-08-10 09:57:15,884 DEBUG [agent.manager.DirectAgentAttache] >>> (DirectAgent-338:null) Seq 17-665190891: Executing request >>> 2012-08-10 09:57:16,312 DEBUG [agent.manager.DirectAgentAttache] >>> (DirectAgent-338:null) Seq 17-665190891: Response Received: >>> 2012-08-10 09:57:16,312 DEBUG [cloud.vm.VirtualMachineManagerImpl] >>> (DirectAgent-338:null) Cleanup succeeded. Details null >>> 2012-08-10 09:57:16,312 DEBUG [agent.transport.Request] >>> (StatsCollector-3:null) Seq 17-665190891: Received: { Ans: , MgmtId: >>> 130577622632, via: 17, Ver: v1, Flags: 10, { GetHostStatsAnswer } } >>> 2012-08-10 09:57:16,313 DEBUG [cloud.vm.VirtualMachineManagerImpl] >>> (StatsCollector-3:null) Cleanup succeeded. Details null >>> 2012-08-10 09:57:18,864 DEBUG [agent.manager.AgentManagerImpl] >>> (AgentManager-Handler-15:null) Ping from 18 >>> 2012-08-10 09:57:24,407 DEBUG [agent.manager.DirectAgentAttache] >>> (DirectAgent-71:null) Ping from 17 >>> 2012-08-10 09:57:24,566 DEBUG [agent.manager.DirectAgentAttache] >>> (DirectAgent-338:null) Ping from 16 >>> 2012-08-10 09:57:29,615 DEBUG [agent.manager.AgentManagerImpl] >>> (AgentManager-Handler-1:null) Ping from 22 >>> 2012-08-10 09:57:30,047 DECat /etc/xensource/network.confBUG >>> [agent.manager.DirectAgentAttache] >>> (DirectAgent-294:null) Seq 16-92405762: Executing request >>> 2012-08-10 09:57:30,308 DEBUG [agent.manager.DirectAgentAttache] >>> (DirectAgent-294:null) Seq 16-92405762: Response Received: >>> 2012-08-10 09:57:30,308 DEBUG [agent.transport.Request] >>> (DirectAgent-294:null) Seq 16-92405762: Processing: { Ans: , MgmtId: >>> 130577622632, via: 16, Ver: v1, Flags: 10, >>> >> [{"ClusterSyncAnswer":{"_clusterId":1,"_newStates":{},"_isExecuted":fal >> se,"result":true,"wait":0}}] >>> } >>> 2012-08-10 09:57:31,060 DEBUG [agent.manager.DirectAgentAttache] >>> (DirectAgent-357:null) Seq 17-665190402: Executing request >>> 2012-08-10 09:57:31,250 DEBUG [agent.manager.DirectAgentAttache] >>> (DirectAgent-357:null) Seq 17-665190402: Response Received: >>> 2012-08-10 09:57:31,250 DEBUG [agent.transport.Request] >>> (DirectAgent-357:null) Seq 17-665190402: Processing: { Ans: , MgmtId: >>> 130577622632, via: 17, Ver: v1, Flags: 10, >>> [{"Answer":{"result":true,"wait":0}}] } >>> >>> This is a very serious error, and I don't know how to fix it. Can >>> anyone suggest what might be the problem and hos I might fix it? >>> >>> >> >> >